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Abstract 

In  the  last  ten  years  declaration-free  programming  languages  with  a  polymorphic  typing  disci- 
pline (ML,  B)  have  been  developed  to  approximate  the  flexibility  and  conciseness  of  dynamically 
typed  languages  (LISP,  SETL)  while  retaining  the  safety  and  execution  efficiency  of  conventional 
statically  typed  languages  (Algol68,  Pascal).  These  polymorphic  languages  can  be  type  checked 
at  compile  time,  yet  allow  functions  whose  arguments  range  over  a  variety  of  types. 

We  investigate  several  polymorphic  type  systems,  the  most  powerful  of  which,  termed  Milner- 
Mycroft  Calculus,  extends  the  so-called  let-polymorphism  found  in,  e.g.,  ML  with  a  polymorphic 
typing  rule  for  recursive  definitions.  We  show  that  semi-unification,  the  problem  of  solving 
inequalities  over  first-order  terms,  characterizes  type  checking  in  the  Milner-Mycroft  Calculus  to 
polynomial  time,  even  in  the  restricted  case  where  nested  definitions  are  disallowed.  This  permits 
us  to  extend  some  infeasibility  results  for  related  combinatorial  problems  to  type  inference  and 
to  correct  several  claims  and  statements  in  the  literature. 

We  prove  the  existence  of  unique  most  general  solutions  of  term  inequalities,  called  most 
general  semi-unifiers,  and  present  an  algorithm  for  computing  them  that  terminates  for  all  known 
inputs  due  to  a  novel  "extended  occurs  check".  We  conjecture  this  algorithm  to  be  uniformly 
terminating  even  though,  at  present,  general  semi-unification  is  not  known  to  be  decidable.  We 
prove  termination  of  our  algorithm  for  a  restricted  case  of  semi-unification  that  is  of  independent 
interest. 

Finally,  we  offer  an  explanation  for  the  apparent  practicality  of  polymorphic  type  inference 
in  the  face  of  theoretical  intractability  results. 


Chapter  1 

Introduction 

1.1      Problem  Background 

Most  programming  languages  provide  the  notion  of  types  as  their  most  fundamental  abstraction 
from  the  unstructured  universe  of  basic  computer  structures.  While  some  languages  perform 
type  checking  -  checking  for  type  consistent  usage  of  program  objects  -  at  run-time  (e.g.,  LISP, 
PROLOG,  APL),  others  do  it  at  compUe-time  (Pascal,  Ada,  ML,  etc.).  Doing  it  at  compile 
time  has  the  advantage  that  type  errors,  a  common  form  of  errors,  are  detected  before  the 
program  is  run.  This  usually  comes  at  the  price  of  cumbersome  explicit  type,  variable  and 
other  declarations.  Recently  languages  such  as  ML  [32]  have  been  designed  that  try  to  combine 
the  safety  of  compile-time  type  checking  with  the  .^exibility  of  declaration-less  programming 
by  inferring  type  information  from  the  program  rather  than  insisting  on  extensive  declarations. 
ML's  type  discipline  allows  for  definition  and  use  of  (parametric)  polymorphic  functions;  that  is, 
functions  that  operate  uniformly  on  arguments  that  may  range  over  a  variety  of  types. 

A  peculiarity  in  ML  is  that  occurrences  of  a  recursively  defined  function  inside  its  definition 
body  can  only  be  used  monomorphically  (all  of  them  have  to  have  identically  typed  arguments 
and  their  results  are  typed  identically),  whereas  occurrences  outside  its  body  can  be  used  poly- 
morphically  (with  arguments  of  different  tj-pes).  This  thesis  studies  the  computational  implica- 
tions for  type  inference  in  an  extension  of  ML's  typing  system,  which  we  primarily  attribute  to 
Mycroft  [85],  that  treats  recursively  defined  functions  equally  and  uniformly  inside  and  outside 
their  bodies. 

Although  the  motivation  for  studying  Mycroft's  extension  to  ML's  typing  discipline  may  seem 
rather  esoteric  and  of  purely  theoretical  interest,  it  stems  from  practical  considerations.  In  ML 
many  typing  problems  attributable  to  the  monomorphic  recursive  definition  constraint  can  be 
avoided  by  appropriately  nesting  function  definitions  inside  the  scopes  of  previous  definitions. 
Since  ML  provides  a  form  of  polymorphic  definition  called  let-polymorphism  in  most  cases 
nesting  definitions  is,  indeed,  a  workable  scheme.  Some  languages,  however,  do  not  provide 
scoped  nesting,  but  only  top-level  definition  of  functions.  Consequently,  all  these  definitions 
have  to  be  considered,  in  general,  as  a  single,  mutually  recursive  definition.  For  example,  B, 
SETL,  and  Prolog  do  not  provide  nested  scopes.  Adopting  ML's  monomorphic  typing  rule 
for  recursive  definitions  in  these  languages  would  preclude  polymorphic  usage  of  any  defined 
function  inside  any  definition.  In  particular,  since  logic  programs,  as  observed  in  ^86],  can  be 
viewed  as  massive  mutually  recursive  definitions,  using  an  ML-style  type  system  would  eliminate 
polymorphism  from  strongly  typed  logic  programming  languages  almost  completely.  Mycroft's 
extension,  on  the  other  hand,  permits  polymorphic  usage  in  such  a  language  setting. 


In  many  cases  it  is  possible  to  investigate  the  dependency  graph  ("call  graph")  of  mutually 
recursive  definitions  and  process  its  maximal  strong  components  in  topological  order  thus  simu- 
lating polymorphically  typed,  nested  let-definitions,  but  this  is  undesirable  for  several  reasons: 

1.  The  resulting  typing  discipline  cannot  be  explained  in  a  syntax-directed  fashion,  but  is 
rather  reminiscent  of  data-flow  oriented  reasoning.  This  runs  contrary  to  structured  pro- 
gramming and  program  understanding.  For  example,  finding  the  source(s)  of  typing  errors 
in  the  program  text  is  made  even  more  difficult  than  the  already  problematical  attribution 
of  type  errors  to  source  code  in  ML-like  languages    51,  120j. 

2.  The  topological  processing  does  not  completely  capture  the  polymorphic  typing  rule.  My- 
croft  reports  on  a  mutually  recursive  definition  he  encountered  in  a  "real  life"  programming 
project  that  could  not  be  typed  in  ML,  but  could  be  typed  by  using  the  extended  poly- 
morphic typing  rule  for  recursive  definitions    85,  section  8j. 

1.2  An  Example 

As  an  illustration  of  the  monomorphic  typing  rule  for  recursive  definitions  consider  the  following 
standard  definition  of  map  and  squarelist  in  Standard  ML,  taken  directly  from  ^^85]. 

fun  map  f  1  =  if  null  1  then  nil  else  f  (hd  1)  ::  map  f  (tl  1) 
and 
squarelist  1  =  map  (£n  x:  int  =>  x  *  x)  1; 

As  it  is  written,  this  is  a  simul  aneous  definition  of  map  and  squarehst  even  though  squarelist  is 
not  used  in  the  definition  of  map.  An  ML-style  type  checker  would  produce  the  types 

map:  (int  —>  int)  —  (int  list  — ♦  int  list) 
squarelist:  int  list  — >  int  list 

even  though  we  would  expect  the  type  of  map  to  be 

map:  Vq.V/3.(q  -.  5)  ->  (alist  -*  ^list), 

which  is  the  type  produced  by  defining  —  sequentially  —  first  map  and  then  squarelis^. 

If  we  were  to  use  map  in  another  line  of  the  same  mutually  recursive  definition  with  an  argu- 
ment type  different  from  int  list  we  would  even  get  a  type  error.  This  peculiarity  comes  from  the 
fact  that  the  MOner  Calculus  permits  recursively  defined  functions  to  be  used  monomorphically 
only  inside  their  bodies  whereas  they  may  still  be  used  polymorphically  —  with  arguments  of 
different  types  —  outside  their  bodies. 

1.3  Outline  of  thesis 

At  the  core  of  this  thesis  is  a  study  of  the  type  inference  problem  of  ML's  type  system  extended 
with  a  polymorphic  typing  rule,  termed  Milner-Mycrofl  Calculus  here,  and  some  of  its  relatives. 

Motivated  by  the  well-known  reduction  of  simple  type  inference  to  first-order  unification  we 
relate  type  inference  calculi  to  unification-like  problems  that  distill  the  combinatorial  essence 
from  the  presentation  of  the  typing  problems.  In  particular,  we  show  that  semi-unification  is 
at  the  heart  of  Milner-Mycroft-style  type  inference.    Because  of  this  central  role,  we  study  the 


algebraic  and  algorithmic  aspects  of  semi-unification.  Although  semi-unification  appears  worthy 
of  study  on  the  merit  of  its  fundamental  character  alone,  we  show  that  most  of  the  results  on 
semi-unification  translate  back  to  type  inference  and  thus  yield  new  results  and  new  proofs  of 
known  results. 

1.3.1  Simple  type  inference  and  unification 

We  expand  on  some  work  by  Kanellakis  and  Mitchell  [53]  and  give,  in  detail,  a  log-space  reduction 
of  first-order  unification  to  simple  type  inference.  This  shows  that  simple  type  inference  is  log- 
space  equivalent  to  unification;  in  particular,  it  is  P-complete  under  log-space  reductions.  The 
encoding  of  first-order  terms  by  A-expressions  is  useful  in  later  reductions. 

1.3.2  Polymorphic  type  inference  and  semi-unification 

Semi-unification  is  the  problem  of  solving  term  inequalities,  M  <  N,  where  <  is  interpreted  as 
the  subsumption  preordering  on  terms:  Af  <  iV  <=>•  there  is  a  substitution  p  such  that  p{M)  =  N . 
We  present  two  polynomial-time  reductions:  from  type  inference  in  the  Milner-Mycroft  Calculus 
(and  the  Milner  Csdculus)  to  semi-unification,  and  from  semi-unification  to  type  inference  in  the 
Flat  Milner-Mycroft  Calculus,  which  is  a  (minimal)  programming  language  with  only  toi>-level 
polymorphically  typed  recursive  definitions.  As  corollaries  we  obtain  that 

1.  semi-unification  characterizes  type  inference  in  the  Milner-Mycroft  Calculus  up  to 
polynomial-time  equivalence; 

2.  type  inference  in  the  Milner-Mycroft  Calculus  can  be  efficiently  reduced  to  the  case  with 
only  a  single  recursive  definition  and  no  other  definitions  (Flat  Milner-Mycroft  Calculus); 
this  contradicts  Mycroft's  conjecture  that  the  complexity  of  type  inference  depends  expo- 
nentially on  the  degree  of  nesting  of  recursive  definitions  [85,  p.  228]; 

3.  Kanellakis  and  Mitchell's  seminal  result  of  PSPACE-hardness  for  the  Milner  Calculus  [53] 
extends  to  the  Flat  Milner-Mycroft  Calculus,  solving  a  question  posed  by  Kanellakis; 

4.  type  inference  in  the  programming  language  B  [75]  is  no  simpler  than  semi-unification  and 
type  inference  in  the  Milner-Mycroft  Calculus,  and  Meertens'  uniformly  terminating  type 
inference  algorithm  [74]  is  incomplete  in  the  sense  that  it  indicates  type  errors  for  some 
typable  B  programs. 

1.3.3  Algebraic  structure  of  semi-unification 

We  show  that  strong  equivalence,  the  standard  formalization  of  "renaming  of  variables" ,  does  not 
adequately  capture  the  structure  of  the  solutions  of  semi-unification  problems,  thus  correcting  a 
statement  by  Chou  [15].  A  slightly  weaker  notion  —  weak  equivalence  —  permits  us  to  show  that 
the  set  of  solutions  of  any  semi-unification  problem  form  a  complete  lattice;  in  particular,  there 
is  always  a  most  general  solution  (semi-unifier)  unique  up  to  weak  equivalence  if  there  exists 
a  semi-unifier  at  all.  As  a  corollary,  the  connection  of  polymorphic  type  inference  and  semi- 
unification  yields  a  simultaneous  proof  of  the  principal  typing  property  for  the  type  systems  we 
investigate. 

1.3.4  Specification  of  most  general  semi-unifiers 

Most  general  semi-unifiers  exist  and  are  unique  modulo  weak  equivalence;  we  present  a  nondeter- 
ministic  algorithm  for  computing  the  most  general  semi-unifier  of  any  semi-unification  problem. 


It  contains  an  "extended  occurs  check"  that  eliminates  all  known  cases  that  lead  Mycroft's  [85, 
section  6]  and  Meertens'  [74,  algorithm  AA]  type  inference  algorithms  to  nontermination.  We 
conjecture  that  our  algorithm  terminates  uniformly,  thus  implying  decidability  of  the  Milner- 
Mycroft  Calculus  and  semi-unification,  a  currently  open  problem.  This  basic  algorithm  is  de- 
scribed in  three  paradigmatic  forms:  as  a  functional,  a  rewriting,  and  a  graph-theoretic  program 
specification.  All  three  are  proved  partially  correct. 

1.3.5  Efficient  algorithm  for  uniform  semi-unification 

We  study  a  space-efficient  algorithm  for  uniform  semi-unification,  a  provably  decidable  subclass 
of  general  (nonuniform)  semi-unification.  Kapur  et  al.  have  an  elegant  algorithm  for  deciding 
semi-unifiability  in  polynomial  time.  We  present  our  own,  independently  devised,  somewhat 
more  complicated  algorithm;  it  is  less  efficient,  but  computes  a  most  general  semi-unifier,  in 
contrast  to  their  decision  algorithm. 

1.3.6  Decidability  —  elementary  approaches 

We  present  some  basic  combinatorial  properties  of  the  graph-theoretic  version  of  our  basic  semi- 
unification  algorithm  in  the  hope  that  some  deeper  investigation  will  eventually  lead  to  estab 
lishing  its  uniform  termination  property.  This  seems  appropriate  to  us  since  the  "nonlocal" 
nature  of  the  extended  occurs  check  in  our  specifications  suggests  that  combinatorial  properties 
are  stated  most  easily  in  a  graph-theoretic  setting. 

1.3.7  Implications  for  practical  programming  languages 

Begii  ning  with  the  PSPACE-hardness  result  for  the  Milner  Calculus  there  has  been  a  gap  be- 
tween the  theoretical  infeasibility  of  polymorphic  type  inference  and  its  observed  practical  suc- 
cess. This  discrepancy  appears  even  more  pronounced  in  the  Milner-Mycroft  Calculus.  We  offer 
a  tentative  explanation  of  this  gap  in  terms  of  resource-bounded  typings,  justified  by  the  intent 
of  typings  as  computational  and  conceptual  abstractions  of  the  computations  of  a  program.  If  we 
impose  the  —  as  we  think  —  reeisonable  restriction  that  the  inferred  type  information  must  not 
be  super-polynomially  bigger  than  the  size  of  the  underlying  programs,  we  can  show  that  poly- 
morphic type  inference  in  the  style  of  the  Milner  and  Milner-Mycroft  calculi  are  both  practically 
and  theoretically  tractable. 


Chapter  2 

Implicitly  Typed  Lambda  Calculi 


The  aim  of  our  work  is  to  study  the  principsd  aspects  of  type  checking  and  type  inference  in 
programming  languages,  especially  as  they  relate  to  parametric  polymorphic  features.  To  do  this 
we  shall  use  a  language  that  contains  only  the  features  we  are  interested  in  so  as  to  understand 
them  independently  of  their  possible  interactions  with  other  language  features.  This  is  not  to 
say  that  other  features  are  irrelevant  or  of  less  interest.  In  fact,  operator  overloading  [52,  118], 
implicit  and  explicit  type  coercions  [101,  78,  27],  abstract  and  dependent  types  [82,  67,  34], 
recursive  types  [110,  70,  77]  and  especially  inclusion  polymorphism  [10,  11,  111,  50,  98,  121,  122]), 
a  type-theoretic  view  of  the  behavior  of  object-oriented  programming  languages,  are  significant 
in  the  typing  disciplines  of  modern  strongly  typed  programming  languages  (e.g.,  [103,  12]).  But 
we  cannot  hope  to  combine  several  features  and  study  their  interactions,  before  we  understand 
t.iem  individually.  We  refer  the  reader  to  [102]  and  [13]  for  an  introduction  and  exposition  of 
types  and  type  checking  in  programming  languages. 

2.1      Untyped  Lambda  Calculus 

We  start  with  a  simple  functional  language  A,  the  extended  X-calculus  [90],  also  called  Exp  in 
[23,  85].  It  has  function  abstraction,  application,  definition,  and  fixed  point  computation.  We 
shall  refer  to  it  as  the  (untyped)  A-calculus  even  though  the  (pure)  A-calculus  classically  contains 
only  function  abstraction  and  application  [3]. 

2.1.1      Syntax 

The  set  A  of  A-expressions  (^expressions)  is  defined  by  the  following  abstract  syntax. 

e  ::=  z  ]  A  z.e  j  (ee')  ] 
let  z  =  e'  in  e  I 
fix  z.e 

where  x  ranges  over  a  countably  infinite  set  V  of  variables.  In  these  productions  A,  let,  and  fix 
bind  z  in  e;  let  does  not  bind  x  in  e',  and  application,  denoted  by  juxtaposition,  does  not  bind 
anything  at  all.  A  variable  or  variable  occurrence  in  an  expression  e  that  is  bound  by  A  is  a 
A-fcouni  variable,  respectively  variable  occurrence;  same  for  let  and  fix.  If  a  variable  occurrence 
is  not  bound,  it  is  free.  A  variable  is  free  in  a  A-expression  e  if  it  has  a  free  occurrence  in  e.  The 
convention  for  omitting  parentheses  is  that  application  associates  to  the  left,  and  application 
has  higher  precedence  than  any  other  construction.    We  may  abbreviate  \x1.Xx2. . .  ■  Xxk.e  to 


XxiX2  .  ■ .  Xk-e  OT  Xx.e  if  f  denotes  the  sequence  xiXj  . . .  Xk-  A-expressions  will  usually  be  denoted 
by  the  letter  e  and  primed  or  subscripted  versions  of  e;  variables  by  x,y  along  with  their  sub- 
and  superscripted  variants. 

2.1.2      Operational  Semantics 

Instead  of  encoding  renaming  of  A-bound  variables  by  an  explicit  axiom  of  Q-conversion  (see, 
e.g.,  [42,  definition  1.16])  we  follow  Barendregt  [3]  and  write  e  =  e'  if  e'  is  identical  to  e  except 
that  it  may  have  some  A-bound  variables  systematically  renamed.  Every  A-expression  is  then 
understood  as  a  representative  of  its  =-equivalent  expressions,  and  all  operations  on  A-expressions 
are  always  defined  on  =-equivalence  classes.  For  e,  e'  £  A,  z  G  V^,  e[e'/x]  denotes  the  simultaneous 
replacement  of  all  free  occurrences  of  z  in  e  by  e';  as  usual  we  assume  that  bound  variables  in 
e  are  renamed  appropriately  to  avoid  "capturing"  free  variables  in  e' .  This  is  an  acceptable 
convention  with  the  proviso  just  made  [3,  p.  26]. 

The  operational  semantics  of  A-expressions  is  defined  as  the  reflexive,  transitive,  compatible' 
closure,  — »,  of  the  union  of  the  following  notions  of  reduction  (see  [3,  chapter  3]). 

(Az.e)e'  -^ff        e[e'/x] 

let  a  =  e'  in  e     -*iet     «[«'/*] 
fix  i.e  ~*fix     let  z  =  (fix  z.e)  in  e 

In  our  examples  we  may  sometimes  add  "constants"  such  as  natural  numbers  with  some 
arithmetic  operators  and  the  Boolean  values  with  some  logical  operators  to  our  A-calculus. 
Whenever  suitable  we  shall  use  infix  notation  for  constant  operations  instead  of  prefix.  We 
may  tacitly  assume  the  existence  of  suitable  reduction  relations,  summarily  called  (5-reductions, 
that  implement  the  usual  semantics  on  those  constants.  Our  theory  is  developed  only  for  the 
"pure"  A-calculus,  although  —  or  because  —  it  can  easily  be  extended  to  include  constants. 

As  an  example  of  an  expression  with  constants, 

fix  f. Xx.it  z  =  0  then  1  else  x*  f{x-  1) 
denotes  the  factorial  function,  and 

let  fact  = 

fix  /.Az.if  z  =  0  then  1  else  x*  f{x-  1)  In 
fact  5 

reduces  to  120  via  — >. 

Equality  (/?-equality),  =,  is  the  congruence  relation  generated  by  -^.  As  is  well-known,  for 
the  untyped  A-calculus  we  could  have  dispensed  with  let  and  fix  since  they  are  both  definable 
by  abstraction  and  application  alone: 


e'ine     =     (Az.e)e' 
fixz.e     =     r(Az.e) 

where  Y  =  Xf.WW  and  W  =  Az./(zz)  or  y  =  WW  and  W  =  Xx.Xy.y{xxy).  For  the  second 
definition  of  W  we  also  have  y(Az.M)  -^  (Az.A/)(y(Az.Af)). 

'A  relation  fl  is  compatible  if  it  is  dosed  under  taking  contexts;  that  is,  (ei.ej)  G  fl  implies  (C[ei],  C[ej])  6  R 
for  any  context  CQ  surrounding  ei,  respectively  ej. 


Nonetheless  we  shall  keep  let  and  fix  forms  since  there  are  typed  versions  of  the  A-calculus 
in  which  the  above  replacements  are  not  possible  since  the  right-hand  sides  may  not  necessarily 
satisfy  the  typing  rules,  which  is  to  say  that  the  sort  of  typing  we  shall  consider  is  in  general  not 
closed  w.r.t.  equality). 

2.2      Type  Inference  Systems 

It  is  not  easy  to  find  a  modern  set-theoretic  interpretation  of  the  A-calculus  in  which  application  is 
modeled  by  (set-theoretic)  function  application,  and  A-abstraction  is  interpreted  as  the  definition 
of  a  (set-theoretic)  function.  This  is  mainly  due  to  the  possibility  of  unbridled  self-application, 
as  in  z*.  Also,  concerns  over  representation  independence  and  type  integrity  in  the  design  of 
programming  languages  lead  to  the  introduction  of  typing  disciplines  that  restrict  the  class  of 
A-expression  that  are  considered  acceptable  (well-typed).  We  shall  briefly  present  the  mechanism 
for  specifying  various  related  typing  disciplines. 

2.2.1      Notational  Prerequisites 

The  notational  conventions  used  here  are  fairly  standard.  The  reader  familiar  with  [23]  and 
[85]  or  any  number  of  logically  specified  polymorphic  type  systems  is  encouraged  to  skip  this 
subsection. 

Type  Expressions 

The  type  expressions  {types)  are  formed  according  to  the  following  productions. 

r     ::=:     «  j  a  |  r  — ♦  r 
a     ::=     r  \  Vq.ct 

where  q  ranges  over  an  infinite  set  TV  of  <ype  variahlei  disjoint  from  V,  and  k,  ranges  over  given 
primitive  types,  such  as  integer,  Boolean,  etc,  and  V  is  a  (type)  variable  binding  operator. 
The  distinction  between  free  and  bound  variables  (variable  occurrences)  in  type  expressions  is 
as  expected:  all  occurrences  of  V-bound  variables  are  bound,  all  other  occurrences  are  free.  The 
type  expressions  M  derivable  from  r  are  the  monoiypey^  the  type  expressions  11  derivable  from 
(7  are  called  polytypes. 

For  f  =  T1T2  . . .  T],  we  may  write  Vrir2  . . .  t^.t'  or  yf.r'  for  Vti.Vtj.  . .  .  "irk-r' .  The  function 
type  constructor,  — >,  is  right-associative;  that  is,  ti  — >  7^  — ►  ra  should  be  parsed  as  tj  — » 
(t2  — >  T3).  For  any  type  expression  a  we  write  j-[ri/ai, . . . ,  T\lai,]  to  denote  the  type  expression 
resulting  from  simultaneously  substituting  r^  for  jdl  free  occurrences  of  Qj,  1  <  »  <  fe,  in  cr. 

Note  that  the  V-quantifiers  in  polytypes  can  only  appear  as  prefixes  of  type  expressions,  which 
is  the  critic2il  difference  from  the  Second  Order  A-calculus  [29,  100]. 

The  Greek  letter  t  always  indicates  a  monotype,  while  the  Greek  letter  <r  signals  a  polytype, 
and  letters  from  the  beginning  of  the  Greek  alphabet  stand  for  type  variables.  This  is  the  same 
convention  as  in  [23]  and  [85]. 

Type  Assignments 

A  type  assignment  (or  type  environment)  .4  is  a  mapping  from  a  finite  subset  of  V  (variables)  to 
n  (polytypes).  Type  assignments  are  mostly  used  to  formulate  assumptions  about  the  types  of 


Note  that,  in  contrast  to  [76]  and  [85]  our  monotypes  can  contain  (necessarily  tree)  occxurences  of  type 
variables. 


Full  name 

Abbreviation 

Acronym 

Curry-Hindley  Calculus 

Hindley  Calculus 

CH 

Damas-Milner  Calculus 

Milnet  Calculus 

DM 

Milner-Mycroft  Calculus 

Mycroft  Calculus 

MM 

Flat  Milner-Mycroft  Calculus 

Flat  Mycroft  Calculus 

FMM 

Figure  2.1:  Names  and  abbreviations  of  typing  calculi 


variables  occurring  free  in  some  expression  under  consideration.  This  is  necessary  since  the  type 
of  an  expression  e  depends,  in  general,  on  the  types  of  variables  occurring  free  in  e.  For  given  A 
we  define 


A{x:a}{y) 


A(y), 


y  ?^  * 
y  =  *; 


that  is,  the  value  of  yl{z  :  cr}  at  x  is  cr,  and  at  any  other  value  it  is  identical  to  A.  We  say  a  type 
variable  q  occurs  free  in  A  if  it  occurs  free  in  A[x)  for  some  x  in  the  domain  of  A. 
The  capital  letter  A  henceforth  always  denotes  a  type  assignment. 

Typings 

Typings  are  the  well-formed  formulae  (judgments)  of  our  type  calculi.  A  typing  consists  of  three 
parts:  a  type  assignment  A,  an  expression  e,  and  a  type  expression  u,  written  as  A  D  e  :  (t.  It 
should  be  read  as  "In  the  type  environment  A,  the  expression  e  has  type  cr".  Of  course,  not  all 
typings  are  acceptable.  Acceptability  is  defined  statically  by  derivability  in  inference  systems. 

2.2.2      The  Hindley,  Milner,  Mycroft,  and  Flat  Mycroft  Calculi 

We  shall  study  four  type  inference  systems:  the  Curry-Hindley  Calculus,  the  Damas-Milner 
Calculus,  the  Milner-Mycroft  Calculus,  and  the  Flat  Milner-Mycroft  Calculus.  Instead  of  using 
their  full  names  we  shall  abbreviate  them  throughout  by  using  only  the  second  component  of 
their  compound  names  in  running  text  or  their  acronym  in  derivations,  tables,  etc.  (see  Figure 

2-1). 

With  the  exception  of  the  Flat  Mycroft  Calculus  all  typing  calculi  under  consideration  here 
share  the  fact  that  they  are  defined  over  the  same  class  of  programs  (A-expressions)  and  t"he  same 
set  of  judgments  (typings).  Their  only  differences  are  that  they  do  not  have  the  same  inference 
rules.  Since  they  share  several  of  their  axiom  and  rule  schemes,  though,  a  list  of  all  axioms  and 
rules  is  given  in  Table  2.1.  Table  2.2  shows  which  of  the  axioms  and  rules  are  present  in  which 
calculus,  and  which  ones  are  not. 

Let  X  =  CH,  DM,  MM,  FMM.  We  write  X  b  A  3  e  :  tr  if  A  D  e  :  a  \s  derivable  in  the 
Hindley  Calculus  (X  =  CH),  the  Milner  Calculus  (X  =  DM),  the  Milner-Mycroft  Calculus  (X  = 
MM),  or  the  Flat  Milner-Mycroft  Calculus  (X  =  FMM).  If  X  is  clear  from  the  context,  we  may 
simply  write  A  D  e  :  tr  to  indicate  that  this  typing  is  derivable  in  X.  Let  e  be  a  A-expression, 
and  let  X  =  CH,  DM,  MM,  or  FMM.  We  say  e  is  well-typed  or  typable  in  X  (or  simply  well- 
typed/typable,  if  it  is  clear  with  respect  to  which  typed  calculus)  if  there  is  a  type  environment  A 
and  a  type  expression  <t  such  that  A  Z)  e  :  cr  is  derivable  in  X.  The  typabiliiy  problem  for  X  is  the 
problem  of  deciding  the  set  of  all  well-typed  expressions  in  the  X.  We  may  often  abbreviate  "the 
typability  problem  for  the  X  Calculus"  to  simply  "the  X  Calculus"  as  in  "The  Hindley  Calculus 
is  log-space  equivalent  to  unification".  As  we  shall  see  below,  every  expression  e  typable  in  the 
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Let  A  range  over  type  environments;  z  over  variables;  e,e'  over  A-expressions;  a  over  type 
variables;  r,  r'  over  monotypes;  a, a'  over  polytypes.  The  following  are  type  inference  axiom  and 
rule  schemes. 


Name 

Axiom/rule 

(TAUT) 

A{x  :  a}  ^  X  :  a 

(GEN) 

AD  e  :a- 

(a  not  free  in  A) 

ADe  :  "ia.a 

(INST) 

Ad  e  :  "ia.a 

ADe:  <T[r/QJ 

(ABS) 

A{x  -.r'yDe-.T 

A  D  Az.e  -.t'  —  T 

(APPL) 

ADe-.r'  ^T 
ADe'  -.t' 

A  D  (ee')  :  r 

(LET-M)      AD  e-.T 

A{x  :  t)  D  e'  :  a' 

A  D  let  a;  =  eine'  :  <t' 

(LET-P)       ADe:  a 

A{z  :  o-}  D  e'  :  (/ 


A  D  let  z  =  eine'  :  a' 


(FIX-M)       A{x  :  r}  D  e  :  r 
A  D  fix  z.e  :  r 

(FIX-P)        A{z  :  o-}  D  e  :  <r 
A  D  fix  x.e  :  a 


Table  2.1:  Type  inference  axioms  and  rules 


Axiom/rule 

CH 

DM 

MM 

FMM 

TAUT 

y 

V 

V 

V 

GEN 

y 

V 

V 

V 

INST 

s/ 

V 

y 

•J 

ABS 

y 

V 

V 

v/ 

APPL 

V 

V 

V 

^/ 

LET-M 

^/ 

LET-P 

v 

V 

FIX-M 

V 

V 

FIX-P 

V 

V 

The  mark  ^  indicates  the  corresponding  axiom/rule  is  present  in  the  calculus  in  whose  column 
it  appears;  blank  space  means  it  is  not  included.  The  Flat  Mycroft  Calculus  is  restricted  to 
A-expressions  with  no  let-operator  and  with  only  one  occurrence  of  a  fix-operator,  which  must 
occur  at  top-level. 

Table  2.2:  The  Hindley,  Milner,  Mycroft,  and  Flat  Mycroft  type  inference  calculi 

X  Calculus  has  a  unique  (modulo  some  simple  equivalence)  "principal"  type  expression,  given 
a  type  assumption  A,  no  matter  what  choice  of  X.  The  functional  problem  of  computing  the 
principal  type  or  outputting  an  indication  of  untypability  for  given  e,  A  will  be  called  the  type 
inference  problem  for  the  X  Calculus. 

The  Hindley  Calculus  corresponds  to  a  language  without  mandatory  variable  or  parameter 
type  declarations;  yet  every  variable  heis  exactly  one  monotype.  This  is  in  the  spirit  of  con- 
ventional statically  typed  languages  such  as  Pascal  where  every  program  variable  and  every 
procedure  has  a  unique  type.  That  type  has  to  be  declared  within  the  program  itself,  in  contrast 
to  the  Hindley  Calculus. 

The  Milner  Calculus  encodes  the  polymorphism  that  results  from  the  ability  in  languages 
such  £13  ML  [31,  32],  SPS  [119],  Miranda  [117]  to  give  let-bound  variables  z  a  parameterized 
type  that  is  automatically  and  implicitly  instantiated  at  all  applied  occurrences  of  x.  Note 
that  in  the  rule  (FlX-M)  the  type  associated  with  the  (presumably)  recursively  defined  z  is  a 
monotype.*  This  implies  that,  intuitively,  all  occurrences  of  z  in  a  recursive  definition  fix  z.e 
are  monomorphic;  that  is,  they  have  the  same  monotype. 

The  Mycroft  Calculus  models  a  language  such  as  Hope  [8]  that  permits  fix-bound  variables 
(i.e.,  for  the  most  part  recursively  defined  functions)  to  have  parameterized  types  that  can  be 
instantiated  arbitrarily  inside  the  scope  of  their  definition.  Hope  will  admit  such  polymorphically 
typed  recursive  definitions  only  at  the  toji-level  and  requires  explicit  type  declarations,  whereas 
our  Milner-Mycroft  Calculus  permits  even  nested  polymorphically  typed  recursive  definitions 
and  does  not  require  explicit  declarations. 

The  Flat  Mycroft  Calculus  has  only  A-expressions  of  the  form  fix  f.e  where  e  contains  only 
variables,  A-abstractions,  and  applications,  but  no  let-  or  fix-constructs.  It  adopts  the  poly- 
morphic typing  rule  from  the  Mycroft  Calculus  for  its  sole  recursive  definition.  We  call  it  "flat" 
since  no  nesting  of  polymorphically  typed  definitions  —  as  in  the  Milner  Cedculus  (let-rule 
(LET-P))  and  in  the  Mycroft  Calculus  (let-rule  (LET-P)  and  fix-rule  (FIX-P))  —  is  permitted. 
This  essentially  models  polymorphic  programming  languages  with  only  top-level  definitions  that 
are  automatically  mutually  recursive,  as  in  (Polymorphic)  Prolog  [86],  B  [75],  or  (Polymorphic) 
SETL  [35]. 


^Remember  that  t  alwayi  stands  for  a  monotype. 
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In  our  calculi  we  have  deliberately  excluded  programming  language  features  that  have  a  strong 
bearing  on  type  checking,  such  as  coercion,  overloading,  inclusion  polymorphism,  union  types, 
dependent  types;  not  to  mention  assignment,  references,  exceptions.  Note  also  that  the  typing 
disciplines  are  implicit:  there  is  no  mention  of  types  in  the  programs  (A-expressions)  themselves, 
only  in  the  typing  statements  about  them.  This  is  to  say,  ours  is  the  "Curry  viewpoint":  types  are 
properties  of  (untyped)  programs.  This  is  in  contrast  to  the  "Church  viewpoint":  types  occur 
in  programs  and  are  instrumental  in  the  definition  of  what  constitutes  the  notion  of  (typed) 
program  in  the  first  place.  More  importantly,  though,  the  programming  language  considered 
here  has  a  fixed  point  constructor  and  is  thus  universal,  similar  to  LCF,  yet  very  much  in 
contrast  to  many  typed  calculi  that  are  of  interest  for  the  very  absence  of  a  general  fixed  point 
operator.  This  is  the  main  reason  why  we  do  not  refer  to  what  we  call  the  Hindley  Calculus 
as  Church's  Typed  A-calculus.  What  we  call  Milner  Calculus  is  called  ML  (by  Kfoury  et  al. 
[57]),  or  more  loosely  let-polymorphism  or  Milner-style  polymorphism.  Since  it  is  well-known 
that  side-effects  and  pointers  have  an  effect  on  the  soundness  of  polymorphic  typing  disciplines 
[22,  69,  116],  we  prefer  not  to  call  this  typing  calculus  ML,  a  concrete  programming  language 
with  side-effects,  pointers  and  several  other  features.  For  similar  reasons  Kfoury  et  al.'s  ML"*" 
is  our  Mycroft  Calculus.''  The  general  rationale  for  our  choice  of  names  is  that  the  calculi  are 
named  after  researchers  that  are  prominently  associated  with  investigating  their  properties. 

2.2.3     Properties  of  Typed  Calculi 

It  is  quite  clear  that  the  Milner-Mycroft  Calculus  is  more  powerful  than  the  Milner  Calculus, 
which  in  turn  is  more  powerful  than  the  Hindley  Calculus;  that  is  to  say,  every  A-expression 
typable  in  the  Hindley  Calculus  is  typable  in  the  Milner  Calculus,  and  every  A-expression  typable 
in  the  Milner  Calculus  is  typable  in  the  Mycroft  Calculus.  Even  stronger,  the  sets  of  derivable 
typings  in  each  of  these  calculi  are  in  a  containment  relation  along  the  same  lines.  These 
inclusions  of  typable  expressions  are  proper.  Consider,  for  example,  the  expressions  eo  =  letz  = 
Xy.y\n{xx)  and  ei  =  fix  f.\z.(ff).  The  expression  eo  is  typable  in  the  Milner  Calculus  due  to 
the  rule  (LET-P),  but  not  in  the  Hindley  Calculus;  ei  is  typable  in  the  (Flat)  Mycroft  Calculus 
due  to  rule  (FIX-P),  but  not  in  the  Milner  Calculus.  For  example, 

DM  I-  {}  D  let  i  =  Ay.y  In  [xx)  :  Vq.q  — »  a 
MM  h  {}  D  fix  f.Xx.iff)  :  Va.V/3.a  ->  0 

This  shows  that,  indeed,  the  Hindley  Calculus,  the  Milner  Calculus,  and  the  Mycroft  Calculus 
form  a  hierarchy  of  properly  more  powerful  typing  disciplines.  For  completeness'  sake  we  shall 
briefly  touch  upon  results  that  show  that  the  type  systems  we  consider  here  are  not  just  syntactic 
in  nature,  but  interact  with  the  semantics  of  A-expressions  in  an  orderly  fashion. 

Soundness 

Milner  [76]  presents  a  formal  denotational  semantics  for  expressions  and  types  that  allows  spec- 
ification of  a  semantic  notion  of  validity.  A  type  system  is  said  to  be  sound  (with  respect  to 
Milner's  semantics)  if  all  typings  derivable  are  also  semantically  valid.  We  present  only  the  fol- 
lowing theorem  and  refer  the  reader  to  [76],  [23],  [85]  or  [70]  for  an  exposition  of  semantic  issues 
only  alluded  to  here. 

Theorem  1        1.    The  Milner  Calculus  is  sound  (with  respect  to  Milner's  semantics). 


*  Compounding  the  potential  for  conftuion  is  that  Jategaonkar  and  Mitchell  are  investigating  an  object-oriented 
extension  to  ML,  called  ML-H-.  They  call  their  initial  design  in  this  direction  ML-(-. 
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2.    The  Milner-Mycroft  Calculus  is  sound  (with  respect  to  Milner's  semantics). 
Proof: 

1.  See  [23]. 

2.  See  [85]. 

It  may  be  noted  that  the  soundness  of  the  Milner  Calculus  also  follows  immediately  from 
the  soundness  of  the  Milner-Mycroft  Calculus  and  the  fact  that  the  Milner-Mycroft  Calculus 
subsumes  the  Milner  Calculus. 

Subject  Reduction 

None  of  our  typing  disciplines  are  semantically  complete  since  the  property  of  typability  is  not  in- 
variant under  /3-equality.  For  example,  for  K  =  Xx.Xy.z,  I  =  Xx.x  the  expression  [K I){Xx.{xx)) 
is  not  typable  in  any  of  the  typing  disciplines  under  consideration  here,  yet  [K I){Xx.{xx))  =  /, 
and  /  is  clearly  typable.  A  dynamically  iyped  language  is  a  programming  language  with  a  non- 
trivial  typing  discipline  that  is  invariant  under  equality.  Examples  are  LISP  (but  not  the  pure 
A-calculus),  APL,  and  SETL.  Every  dynamically  typed  (universal)  language  has  a  necessarily 
undecidable  typability  problem  in  view  of  Scott's  version  of  Rice's  theorem  [3,  chapter  6.6].  This 
in  fact  necessitates  run-time  type  checking,  hence  motivating  calling  it  "dynamically  typed"  in 
the  first  place. 

Even  though  our  static  typing  disciplines  are  not  invariant  under  equality,  a  slightly  weaker, 
yet  very  desirable  property  holds. 

Theorem  2   (Subject  reduction  property) 

Let  X  =  CH,  DM,  or  MM.  If  X  \-  A  J  e  :  a-  and  e  ^  e' ,  then  X  \-  A  D  e'  :  a-. 

Proof:     See  Curry  and  Feys  [21]  for  X  =  CH.  The  proofs  for  DM  and  MM  are  simple 
generalizations  of  Curry  and  Feys's  original  proof. 

This  theorem  expresses  that  once  a  A-expression  has  been  been  found  to  have  some  type, 
reducing  the  expression  will  preserve  that  type.  In  particular,  it  is  never  possible  to  encounter 
an  untypable  intermediate  result  when  evaluating  (reducing)  any  typable  expression. 

Principal  Typings 

Note  that  there  may  be  many  different  typings  for  a  single  expression.  In  this  subsection  we 
briefly  summarize  for  our  type  systems  what  has  been  called  the  principal  typing  property:  Given 
a  type  assignment  A  every  expression  that  has  a  type  under  A  has  a  unique  most  general  type 
under  A. 

The  generic  instance  preordering  C  between  types  is  given  by 

Vai  ...Qr„.r  C  V/3i . .  ./3„.r[ri/Qi, . . . ,  r„/a„] 

for  any  monotypes  ri,...,r„  whenever  every  (3i(l  <  »  <  m)  is  not  free  in  Vai...Q„.r.    The 
equivalence  induced  by  C.  is  simply  renaming  of  V-bound  type  variables  and  is  denoted  by  =. 
Let  X  =  CH,  DM,  MM,  or  FMM  Calculus.  We  say  o-  is  a  principal  type  for  e  under  yl  in  X  if 

X\-  ADe:<r 
and  for  any  type  a'  such  that 
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X  \-  ADe  :<r' 

we  have  a  C.  a'.  Clearly,  principal  types  are  unique  modulo  =.  If  for  every  A  and  every  e,  e 
has  a  principal  type  under  A  (or  has  no  type  under  A),  then  we  say  the  whole  calculus  X  has 
the  principal  typing  property. 

Theorem  3   (Principal  typing  property) 

Let  X  =  CH,  DM,  MM,  or  FMM  Calculus.  X  has  the  principal  typing  property. 

Proof:     For  CH,  see  [41,  20];  for  DM,  [23];  for  MM  and  FMM,  [85]. 

It  is  easy  to  see  that  if  a  closed  A-expression  e  (a  A-expression  without  free  variables)  has  a 
type  <T  under  any  type  assignment  A  then  it  has  type  a  under  the  empty  assignment  {},  and 
vice  versa.  For  this  reason  we  can  speak  of  the  principal  type  of  e  (independent  of  any  type 
assignment). 

The  type  inference  problem  is  the  (functional)  problem  of  computing  a  principal  type  for 
given  A  and  e  or  flagging  untypability.  Of  course,  the  (decision)  problem  of  typability  is  trivially 
solvable  once  the  type  inference  problem  has  been  solved.  The  converse,  though,  is  not  necessarily 
true  even  though  essentially  all  current  type  checking  algorithms  for  our  typing  disciplines  also 
compute,  directly  or  indirectly,  principal  types.* 

Note  that  even  though  all  type  systems  under  consideration  here  have  the  principal  typing 
property,  it  may  be  that  the  principal  type  for  an  expression  e  in  one  calculus  is  different  from 
the  principal  type  in  another  (for  fixed  A).  Consider,  for  example,  the  Standard  ML  definition 
of  "map"  and  "squarelist"  in  the  program  example  in  chapter  1.  In  the  Milner  Calculus  the 
principal  type  of  "map"  is  the  monotype  (int  — ♦  int)  — ►  int  list  — ►  int  list  whereas  in  the 
Mycroft  Calculus  it  is  VQ.V/3.(a  — ►  /3)  — ►  alist  — >  /Jlist.  Of  course,  this  presumes  an  encoding 
of  the  mutually  recursive  definition  of  "map"  and  "squarelist"  and  of  the  SML  type  constructor 
list  into  the  A-calculus  and  the  language  of  our  type  expressions.  This  is  difl!icult  since  lists  are 
a  recursive  data  type,  but  a  simpler  "pure"  example  illustrating  the  difference  is 

fun  Ix  —  X  and 
J  =  lyo 

under  the  type  assignment  A^  =  {yo  :  int}.   There  are  standard  ways  for  encoding  tuples  and 
mutually  recursive  definitions  by  single  recursive  definitions  in  the  A-calculus.   The  above  SML- 
program  can  thus  be  transformed  into  a  single  recursive  definition  eo  in  the  A-calculus, 

fix  f.Xg.g{\x.x){f{\x.\y.x)yo). 

This  expression,  eo,  is  typable  under  Ao  both  in  the  Milner  Calculus  and  in  the  Mycroft 
Calculus.  The  principal  types,  however,  are  ((int  — »  int)  — ♦  int  — ►  int)  — ►  int  in  the  Milner 
Calculus  and  Va.V/3.((a  -»  q)  —  int  —  ;9)  -» /?  in  the  Mycroft  Calculus. 

2.3      Background 

The  calculi  we  consider  have  appeared  in  the  literature  before,  with  some  variations.    Curry, 
Hindley  and  others  investigated  the  properties  of  "functionality"  of  combinatory  logic  [21,  84, 


'The  fact  that  the  uniform  senu-unifiability  algorithm  of  Kapui  tl  al.    [54]  does  not  compute  most  general 
uniform  semi-unifiers  —  the  equivalent  of  principal  types  —  can  be  viewed  as  a  remarkable  exception. 
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20,  41,  4],  which  is  essentially  what  we  call  the  Hindley  Calculus.  The  Milner  Calculus,  in  its 
logical  form  as  a  typed  A-calculus,  was  investigated  by  Damas  and  Milner  [23,  22]  on  the  basis 
of  earlier  work  by  Milner  [76]. 

As  early  as  in  the  late  70s  Wadsworth  reportedly  worked  on  extending  the  well-known  type 
Inference  algorithm  W  for  the  Milner  Calculus  [76]  to  capture  the  more  general  typing  rule 
(FIX-P)  in  (what  we  call)  the  Mycroft  Calculus,  but  apparently  did  not  publish  his  work  [68]. 
The  polymorphic  programming  language  B,  [75]  has  an  extended  rule  for  typing  recursive  defini- 
tions analogous  to  Mycroft's.  Meertens  [74],  who  designed  it  unaware  of  ML's  polymorphic  type 
system,  presents  a  uniformly  terminating  type  inference  algorithm  for  B.  Since  B  has  neither 
higher-order  functions  nor  nested  declarations,  Meertens  raised  the  question  of  whether  type 
inference  in  what  we  call  the  Mycroft  Calculus  is  decidable.*  Exploring  static  typing  for  logic 
programming  [86],  Mycroft  [85]  investigated  the  properties  of  ML  with  an  extended  rule  for 
recursive  definitions  that  allows  for  polymorphically  typed  occurrences  of  the  defined  function 
in  its  body.  He  was  able  to  show  that  the  resulting  calculus,  which  we  have  called  the  Mycroft 
Calculus,  is  sound  with  respect  to  Milner's  [76]  semantics  and  that  the  principal  typing  property 
of  the  Milner  Calculus  is  preserved.  The  standard  unification-based  type  inference  algorithm  is 
not  complete  for  the  extended  calculus,  though.  Mycroft  provided  a  semi-algorithm  for  comput- 
ing principal  typings,  but  he  left  the  computability  of  that  question  and  the  decidability  of  the 
calculus  open.  Leifi[64]  gave  an  alternate  type  inference  system  for  the  Mycroft  Calculus  (along 
with  an  extension  of  polymorphic  type  inference  to  record-based  subtyping)  based  on  term  in- 
equalities with  context  conditions.  The  decidability  of  the  Mycroft  Calculus  was  specifically 
addressed  by  Kfoury  et  o/.  [57].  They  showed  that  typability  in  the  Mycroft  Calculus  can  be 
reduced  to  a  "Generalized  Unification  Problem  (GUP)"  [56],  which  is  similar  to  LeiB's  formula- 
tion of  inequalities  with  context  conditions,  and  embarked  on  showing  that,  if  a  GUP  instance 
has  a  solution,  it  has  a  solution  whose  size  can  be  bovnded  recursively  as  a  function  of  the  size 
of  the  input.  The  authors  have  reported  a  flaw  in  their  proof,  and  the  general  decidabilit;  of 
the  Mycroft  Calculus  remains  open.  Once  proven  this  would  give  an  essentially  nonalgorithmic 
proof  of  decidability  for  the  Mycroft  Calculus. 

Whereas  the  MUner-Mycroft  Calculus  only  admits  polytypes  with  universal  quantifiers  in 
prefix  position  only  on  the  top  level,  the  Second  Order  A-calculus  [100]  relaxes  this  constraint 
and  permits  polytypes  with  nested  quantifiers.  In  such  a  system  the  let-construct  is  unnecessary 
since  the  equivalent  description  of  a  let-expression  in  the  pure  A-c-alculus  is  typable  if  and  only 
if  the  let-expression  itself  is  typable.  Bohm  [6]  showed  that  partial  type  inference  in  the  Second 
Order  A-calculus  is  undecidable  whereas,  interestingly,  the  decidability  of  full  type  inference  for 
the  same  type  system  is  has  been  open  [65,  28]  since  the  inception  of  the  Second  Order  A-calculus. 
Girard's  system  F^  [29]  generalizes  the  2nd  Order  A-calculus  to  type  expressions  of  arbitrary 
finite  order.  Pfenning  [91]  refined  Bohm's  result  by  showing  that  type  inference  in  the  n-th 
Order  A-calculus,  F„,  is  equivalent  to  rv-th  order  unification  [46,  30].  The  typable  A-expressions 
in  the  conjunction  type  discipline  of  [18]  (see  also  [106,  83])  are  exactly  the  strongly  normalizing 
(untyped)  A-expressions,  which  implies  that  type  inference  is  undecidable.  Nonetheless,  the 
Second  Order  A-calculus  and  the  conjunction  type  discipline  have  had  a  direct  influence  on 
programming  language  design.  The  language  LEAP  employs  is  directly  based  on  F^  and  employs 
partial  type  inference  with  satisfactory  practical  performance  [92].  Reynolds'  language  Forsythe 
[103]  makes  use  of  a  conjunction  type  discipline. 

There  are  many  more  very  powerful  type  systems  whose  immediate  application  is  in  proof 
theory.  They  exploit  and  extend  the  "types-as-propositions"  (and  "expressions-as-proofs")  anal- 


'In  chapter  4  we  shall  see  that  Mycrofl-jtyle  type  inference  is  not  greatly  affected  by  the  absence  or  presence 
of  higher-order  function*  and  nested  dejuutioiu.  Meerteru'  uniformly  terminating  type  inference  algorithm  is 
syntactically  incomplete  in  that  it  signals  a  type  error  for  some  programs  that  are  type  correct  with  respect  to 
the  typing  rules  for  B  (or,  equivalently,  with  respect  to  his  semi-algorithm  AA). 
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ogy  [44]  to  formulate  constructive  proof  systems.  A  sample  of  such  systems  is  AUTOMATH  [24], 
Martin-Lof  type  theory  [73],  the  Calculus  of  Constructions  [19],  and  LF  [33]. 
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Chapter  3 

Semi-Unification:  Basic  Notions 
and  Results 


Semi-unification  is  the  problem  of  solving  sets  of  inequalities  of  the  form  Mi  <  M2  in  the 
subsumption  lattice  of  free  first-order  terms.  The  special  case  of  solving  single  inequalities  has 
found  applications  in  proving  nontetmination  of  term  rewriting  systems  [54]  while  the  general 
case  characterizes  type  inference  in  the  Mycroft  Calculus  (see  chapter  4).  Since  this  problem  does 
not  seem  to  have  attracted  broad  attention  in  computer  science,  in  this  chapter  and  chapter  5,  we 
give  a  comprehensive  treatment  of  its  basic  algebraic  properties  and  contrast  it  with  unification, 
the  problem  of  solving  term  equations. 

Unification  and  semi-un'fication  deal  with  related  problems.  Unification  addresses  solving 
equations  between  free  first-order  terms  while  semi-unification  tackles  the  more  general  question 
of  solving  systems  of  equations  and  inequalities^  (SEI's)  where  inequalities.  Mi  <  M2,  between 
terms  A/i  and  A/2  refer  to  the  subsumption  preordering  <  on  terms. 

In  this  chapter  we  introduce  the  basic  machinery  of  semi-unification.  In  particular,  section  3.1 
describes  terms  and  substitutions  and  their  basic  algebraic  structure,  and  section  3.2  contains 
definitions  of  systems  of  equations  and  inequalities  and  their  solutions,  semi-unifiers,  as  well 
as  some  basic  results.  In  chapter  4  we  shall  show  why  semi-unification  is  relevant  to  type 
inference,  and  in  chapter  5  we  investigate  the  algebraic  structure  of  semi-unifiers,  which  in  turn 
has  reverberations  on  the  structure  of  typings.  Algorithms  for  computing  most  general  semi- 
unifiers  can  be  found  in  chapter  6,  and  some  combinatorial  properties  of  our  basic  semi-unification 
algorithm  are  in  chapter  7. 

3.1      The  Algebraic  Structure  of  Terms  and  Substitutions 

In  this  section  we  define  the  objects  of  our  universe  of  discourse,  terms  and  substitutions,  and 
investigate  elementary  aspects  of  their  algebraic  structure.  The  material  is  mostly  extracted 
from  [47],  [26],  and  [63];  much  of  the  material  dates  back  to  [93],  [94],  [99],  and  [46].    Some 


'We  find  the  prevalent  terminology  somewhat  unfortunate.  While  there  is  a  distinction  between  "equation" 
(something  that  is  to  be  lolved)  and  "equality"  (something  that  holdt),  there  is  no  corresponding  distinction 
with  "inequality"  since  the  term  "inequation"  is  not  commonly  used  in  the  English  language.  Even  worse, 
"inequality"  gives  no  indication  as  to  whether  <  {U$i-ihan-or-equat-to)  or  ^  {not-equal-to)  is  meant,  and  there  is 
no  standard  linguistic  mechanism  for  distinguishing  between  these  two.  The  term  "inequation"  has  popped  up  in 
the  literature,  but,  since  it  is  still  uncommon,  we  will  use  "inequality"  throughout.  This  also  makes  it  possible, 
admittedly  somewhat  artificially,  to  distinguish  our  systems  of  equations  and  inequalities  from  the  related,  but 
different,  systems  of  equations  and  inequations  in  [17]  and  [63]. 
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definitions  and  results  appear  to  be  new.  Though  simple  refinements  of  standard  concepts  and 
results,  they  are  useful  in  later  sections. 

3.1.1      Basic  Definitions 

Definition  1    (Ranked  alphabet,  functors,  constants,  variables,  terms) 

A  (ranked)  alphabet  A  is  a  pair  {F,a)  where  F  is  a  nonempty,  denumerable  set  whose 
elements  are  called  functors  and  a  :  F  —*  J^  maps  every  functor  f  to  its  arity  a(/).  Functors 
with  arity  0  are  called  constants.  A  is  linear  if  all  its  functors  have  arity  at  most  1,  nonlinear 
otherwise. 

A  set  o/ variables  V  for  A  —  {F,a)  is  an  infinite  denumerable  set  disjoint  from  F. 

The  set  o/proper  (first-order)  terms  T{A,  V)  (or  simply  T  whenever  A  and  V  are  understood), 
where  V  is  a  set  of  variables  for  A,  consists  of  all  strings  derivable  from  M  in  the  grammar 

M  ::-  x\f{M,...,M) 

V ' 

ktimes 

where  f  is  a  functor  from  A  with  arity  k,  and  x  is  any  variable  from  V .  The  set  o/ (first-order) 
terms  T^{A,  V)  (or  simply  T^ )  is  T[A,  V)  with  an  additional  distinguished  element  Q  called  the 
undefined  term.^ 

Variables  are  usually  denoted  by  u,v,  x,y,  z,  constants  by  c,  d,  nonconstant  functors  hy  f,g,  h, 
and  terms  by  M,N,  as  weU  as  by  their  respective  subscripted  and  superscripted  versions.  To 
indicate  the  arity  fc  of  a  functor  /  we  may  write  /(*'.  With  these  conventions  in  place  we 
shall  omit  the  parentheses  foUowing  constants  appearing  in  terms  since  this  cannot  lead  to  any 
confusion. 

Two  terms  Mi,  M^  (£  T  are  equal,  denoted  Mi  —  M2,  if  and  only  if  Mi  and  Mj  are  identical 
as  strings;  e.  g.,  f{x,y)  =  f(x,y),  but  f{x,y)  /  /(u,  ti).  The  special  term  Q  is  equal  to  itself 
and  no  other  term. 

The  distinction  between  linear  and  nonlinear  alphabets  is  crucial  since  terms  over  a  linear 
alphabet  can  have  at  most  one  variable  whereas  terms  over  a  nonlinear  alphabet  can  contain  any 
number  of  variables.  In  a  nonlinear  alphabet  it  is  always  possible  to  "emulate"  a  functor  of  arbi- 
trary arity.  For  example,  g''''(A/,  N,  N,  N)  can  be  viewed  as  a  term  f^^^M,  N)  with  a  fictitious 
binary  functor  /('',  and  h^^^h^^'l{Mi,  M2),  M3)  can  be  interpreted  as  a  term  /'^'(A/i,  A/2,  M3) 
with  an  "emulated"  ternary  functor  /(^'.  In  this  sense  we  are  justified  in  stipulating  the  existence 
of  a  functor  /  with  any  arity  k  >  1  without  loss  of  generality  in  a  nonlinear  alphabet.  Note  that 
this  is,  of  course,  not  possible  with  linear  alphabets. 

Definition  2   (Substitutions) 

A  proper  (first-order)  substitution  is  a  mapping  from  V  to  T{A,V)  that  is  the  identity  on 
all  but  a  finite  subset  of  V .  Every  substitution  a  :  V  —*  T(A,V)  can  be  extended  uniquely  to 
a- :  T"(A,  F)  -.  T"iA,  V)  by  the  equations 

a{x)     =     ff{x),  ifx  e  V 

a{n)   =   n 

a(/(*)(Afi,...,AfO)     =     /(*)(*(  Afi),...,a(A/0)- 

The  domain  domo-  of  a  :  V  —>  T{A,  V)  is  {x  £  V  \  (t{x)  ^  x\.   The  canonical  representation  of 
a  with  dom  cr  =  {xi, . . . ,  z„}  is  {zj  1— »  a-(xi),  ...,*„  i—»  cr(Xn)}. 

'Of  course  we  make  the  standard  assumption  here  that  neither  A  nor  V  contain  fl  or  any  of  the  symbols  '(', 
•)•■  or  ','. 
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The  mapping  uj^^v,  rvhich  maps  all  terms  M  G  T"{A,V)  to  fi,  u  called  the  undefined 
substitution.  The  set  of  all  proper  substitutions  is  denoted  by  S{A,  V)  (or  simply  S  whenever  A, 
and  V  are  understood  from  the  context).  The  set  o/ (first-order)  substitutions  5"  (A  V)  consists 
of  S[A,  V)  with  the  additional  mapping  w^.v  • 

We  shall  omit  the  subscript  from  ut{A,v)  below  whenever  A,  V,  and  thus  T{A,  V)  are  clear 
from  the  context.  Similarly,  we  will  identify,  as  is  usual,  every  substitution  a  with  its  extension 
&.  In  this  chapter  and  chapter  5  substitutions  are  ranged  over  by  p,<t,t,  v  along  with  their  sub- 
and  superscripted  variations.  To  avoid  confusion  with  type  expressions,  in  the  other  chapters 
they  may  also  be  denoted  by  letters  R,S,U. 

A  substitution  specifies  the  simultaneous  replacement  of  some  set  of  variables  by  specific 
terms.  For  example,  for  o-q  =  {x  >-♦  u,  y  >-♦  «,  u  i->  y,  u  >->  z}  we  have  cro(/(z,y))  =  f{u,v). 
The  undefined  substitution  maps  everything  to  the  undefined  term;  e.  g.,  u){f{x,y))  =  Q  and 
cj{Q)  =  n. 

For  <7  G  5,  we  will  write  <t  \w  for  the  substitution  defined  by 


"^  1^  (*)  =  \   z,  z  ^  W 


Furthermore,  u)  \w=  w. 

Clearly  substitutions,  if  understood  as  acting  on  terms,  are  closed  with  respect  to  functional 
composition.  The  undefined  term  Q  and  the  unde.^ned  substitution  w  are  useful  in  providing 
a  meaning  for  the  dynamic  notion  of  "failure"  in  unification  and  other  applications.  They  also 
lead  to  a  very  satisfying  algebraic  structure  of  terms  and  substitutions  (see  theorems  4  and  16) 
in  chapter  5. 

3.1.2      Term  Subsumption 

Let  A  be  an  arbitrary,  but  fixed  alphabet  in  this  section,  and  let  V  be  a  set  of  variables  for  A. 

Definition  3    (Subsumption,  a-congruence) 

The  preordering  <  o/ subsumption^  on  T^  is  defined  by 

Ml  <  Mj  O  (3(r  e  J")  <r(A/i)  =  A/j 

for  any  Mi,  M2  €  T^ . 

The  congruence  relation  ^  o/a-congruence  on  T^  is  defined  by 

Af  1  S  Mj  O  Ml  <  M2  A  Mj  <  Ml 

for  all  Ml,  M2  e  T".    We  write  Mi  <  M,  if  Mi  <  M2,  but  Mi  ^  M2.   For  any  M  e  T" ,  [M] 
denotes  the  equivalence  class  of  M  in  T^ . 

U  Ml  <  Mi  we  say  Mi  subsumes  Mj;  e.  g.,  f{x,y)  subsumes  f{g{y),2)  since  for  o-i  =  {z  i-» 
g(y),  yy-fz}  the  equality  (ri(/(z,  y))  -  f{g{y),  z)  holds.  If  Mi  =  Mj  we  say  M2  is  an  a-variant 
of  Ml  and  vice  versa;  e.  g.,  f(x,y)  is  an  a-variant  ot  f{u,v). 

Recall  that  a  partial  order  (L,  <)  is  a  (complete)  lower  semi-lattice  if  it  has  a  greatest  lower 
bound  for  every  finite  (finite  or  infinite)  subset  of  L.  It  is  a  (complete)  upper  semi-lattice  if  it 
has  a  least  upper  bound  for  every  finite  (finite  or  infinite)  subset  of  L.  It  is  a  (complete)  lattice 
if  it  is  both  a  (complete)  lower  semi-lattice  and  a  (complete)  upper  semi-lattice  [66].  Recall  also 

'Note  that  this  definition  follows  [47]  and  [26],  but  is  dual  to  the  definition  in  [63]. 
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that  a  partial  ordering  is  Noetherian  if  it  has  no  infinite  descending  chains  Mi  >  A/j  >  . . .  [47]. 
It  is  well-known  that  any  Noetherian  lower  semi-lattice  is  a  complete  lower  semi-lattice,  and  any 
complete  (lower  or  upper)  semi-lattice  is  already  a  complete  lattice. 

The  preordering  <  on  T"  induces  a  partial  order  on  the  quotient  set  T  / '^  =  {[M]  |  M  G  T^}, 
which  we  will  also  denote  by  <.  The  structure  of  terms  with  respect  to  subsumption  is  captured 
in  the  following  theorem. 


Theorem  4        1.   (T"/5;,<)  is  Noetherian. 
2.  {T^/a,,<)  is  a  complete  lattice. 
Proof:     See  [47]. 


The  least  upper  bound  of  a  set  of  terms  is  called  its  most  general  common  instance;  its 
greatest  lower  bound  is  called  its  most  specific  common  anli-instance.  The  theorem  expresses 
that  both  most  general  common  instance  and  most  specific  common  anti-instance  are  unique 
modulo  a-congruence.  Finding  the  most  general  common  instance  of  a  pair  of  terms  is  a  special 
case  of  the  unification  problem  (disjoint  variable  case).  Finding  the  most  specific  common  anti- 
instance  of  a  pair  is  the  anti-unification  problem  [46,  63].  A  most  general  common  instance 
of  {/(a!,fl(y)),/(3(y),z)}  is  f{9{y),9{2)),  but  also  f{g{u),g(v));  a  most  specific  common  anti- 
instance  is  f{s,t).  Clearly,  [z]  =  V  [x  any  variable)  is  the  least  element  and  [17]  =  {n}  is  the 
greatest  element  in  T^l^. 

The  subsumption  preorder  can  be  extended  to  substitutions,  but  not  in  a  unique  fashion. 
Different  notions  and  their  implications  are  studied  in  chapter  5. 


3.2      Systems    of  Equations    and    Inequalities    and    Semi- 
Unifiers 

In  this  section  we  present  basic  definitions  and  properties  of  inequalities  over  the  subsumption 
preordering  of  terms  and  their  solutions. 


Definition  4   (System,  of  equations  and  inequalities,  nonuniform/uniform  semi-unifier,  unifier) 

A  system  of  equations  and  inequalities  (SEI)  t'j  a  pair  S  =  {E,I)  where  I  =  (7i, . . . ,  /»)  for 
some  ife  £  A/"  and  E,  Ii, . . . ,  I/,  each  consist  of  a  set  of  pairs  of  terms  from  T,  usually  written  .in 
the  form* 


'Note  that  "="  and  "<"  are  only  formal  here. 
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hold  simultaneously^  is  called  a  (nonuniform)  semi-unifier  of  S .  If  Pi  =  P2  =  ■  ■  ■  =  Pn  =  P  /<"" 
some  p,  then  a  is  called  a  uniform  semi-unifier,  and  if  furthermore  p  =  t,  the  identity  substitution, 
then  <T  is  called  a  unifier. 

S  is  solvable  if  it  has  a  semi-unifier  other  than  u).  SU(5)  »j  the  set  of  semi-unifiers  of  S , 
USU(S)  the  set  of  its  uniform  semi-unifiers,  and  U(5)  is  the  set  of  its  unifiers. 

The  special  symbol  O  is  an  additional  SEI  that  has  only  ui  for  a  unifier  and  (non)uniform 
semi-unifier;  we  call  D  the  (only)  improper  SEI.*  The  set  of  all  proper  systems  of  equations  and 
inequalities  over  alphabet  A  and  variables  V  is  denoted  by  T[A,  V)  (or  simply  T  whenever  A  and 
V  are  understood  from  the  context).  T(A,  V)  with  the  additional  improper  SEI  D  is  denoted  by 
T°iA,V). 

*It  is  actually  irrelevant  whether  w  is  permitted  amongst  the  pi  or  not. 

'Here  the  symbols  =  and  <  denote  their  logical  meanings. 

^Hcrc  "="  denotes  term  equality, 

'Note  that  there  are  proper  SEI's  that  have  only  the  improper  u>  as  their  sole  semi-unifier. 
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Semi-unifiabiliiy  is  the  decision  problem  of  determining  if  a  given  SEI  is  solvable  (has  a  proper 
semi-unifier).  As  we  shall  see  in  chapter  5,  every  solvable  SEI  has  a  most  general  semi-unifier  that 
is  unique  up  to  an  appropriate  equivalence  relation  on  substitutions.  The  term  semi-unification 
refers  to  the  (functional)  problem  of  computing  a  most  general  semi-unifier  of  a  given  SEI  or 
flagging  non-semi-unlfiabllity.  Similarly,  uniform  semi-unifiability  and  uniform  semi-unification 
as  well  as  unifiabiiity  and  unification  are  the  decision,  respectively,  functional  problems  that 
correspond  to  finding  uniform  (proper)  semi-unifiers  and  (proper)  unifiers.  Often  we  will  be 
sloppy  and  use  the  term  for  the  functional  problem  to  also  denote  the  decision  problem. 

A  semi-unifier,  in  other  words,  is  a  solution  to  a  given  set  of  equations  and  inequalities 
where  the  inequalities  are  split  into  groups  that  "share"  the  same  quotient  substitution,  but 
the  quotient  substitutions  across  different  groups  of  inequalities  can  be  different.  A  uniform 
semi-unifier  additionally  solves  the  inequalities  in  a  "uniform"  fashion*,  and  a  unifier  solves  the 
inequalities  by  making  both  sides  equal.  By  definition,  if  an  SEI  has  a  unifier  it  has  a  uniform 
semi-unifier,  and  if  it  has  a  uniform  semi-unifier  it  has  a  semi-unifier. 

Clearly,  for  unifiers  there  is  no  need  to  distinguish  between  equations  and  inequalities,  and 
we  can  view,  in  this  case,  an  SEI  5  =  (£^,  J)  cis  a  system  of  equations  alone  made  up  of  £  U  [J  J. 

It  is  well-known  that  a  set  of  equations  can  be  expressed  by  a  single  equation  in  the  sense 
that  the  set  of  its  solutions  (unifiers)  is  identical  to  the  set  of  solutions  of  the  original  set  of 
equations.  An  analogous  result,  with  the  same  simple  proof,  holds  for  uniform  semi-unifiers,  but 
apparently  not  for  nonuniform  semi-unifiers. 

Proposition  1    The  following  statements  are  equivalent. 

1.  .A  is  nonlinear. 

2.  {U(5)  :  5  G  r(A  V)}  =  {U(S)  :  S  £  T{A,  V),  5  =  (f ,  T),  |£|  <  1,  |J|  =  0} 

3.  {usu(5)  :  s  e  r(A v)}  =  {usu(s)  :  s  6  r(A v), s  =  {€,J),  \£\  <  i, \i\  <  i, (v/  g 

I)  |/|  =  1} 

4.  {SU(S)  :  S  G  T{A,  V)}  =  {SU(S)  :  5  €  T{A,  V),  S  =  {£,!),  1^1  <  1,  (V/  G  I)  |/|  =  1} 

Proof: 

Statements  2,  3,  and  4  follow  from  1  by  "tupling".  For  given  SEI  S  form  term  Mi  by 
tupling  all  the  left-hand  sides  of  5,  and  Mj  by  tupling  all  the  right-hand  sides.  Define 
S'  =  {{Ml  =  M2},0);  this  proves  2.  For  3  and  4  proceed  similarly  by  tupling  both 
sides  of  equations  and  all  inequalities  separately,  respectively  by  tupling  equations 
and  the  groups  of  inequalities  separately. 

Each  of  2,  3,  and  4  individually  imply  1,  which  indicates  that  the  ability  to  "tuple" 
is  instrumental  in  embedding  the  theories  of  semi-unifiers  and  unifiers  in  the  above 
subclasses  of  systems  of  equations  and  inequalities.  We  only  prove  3  =>  1,  the  other 
implications  being  very  similar. 

Assume  {USU(S)  :  S  G  T{A,V)}  and  {USU(5)  :  S  G  T{A,V),S  =  {£,I),\^\  < 
1>  \I\  <  1.  C^-''  e  I)  \I\  =  1}  are  identical.  Consider  the  SEI  Si  =  (0,  {{yo  <  zi,  yo  < 
Z2}})  for  pairwise  distinct  yo,xi,X2.  Clearly  <ti  =  {xi  ♦—  22}  is  a  semi-unifier  of 
Si,  but  I  is  not.  If  we  assume  that  no  functor  in  A  has  arity  greater  than  1,  we 
already  know  that  all  terms  in  T[A,  V)  have  at  most  one  variable  occurrence.  Thus 
if  an  inequality  M  <  N  has  a  solution  at  all  then  there  must  be  subterms  M'  and 


*Note  that  (3,  {{x   <    ci},{x   <   cj}})  has  a  semi-iuufier  —  the  identity  substitution  i  —  but  no  uniform 
semi-uniHer. 
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N'  of  M  and  N,  respectively,  such  that  M'  <  N'  has  the  same  set  of  semi-unifiers 
as  M  <  N  and  either  M'  is  a  variable  or  N'  is  a  variable  or  none  of  M,M',N,N' 
contains  a  variable.  If  M'  is  a  variable  then  the  identity  substitution  t  =  {}  is  a 
semi-unifier,  and  if  it  is  not,  then  (Tj  is  not  a  semi-unifier  of  Af'  <  N',  and,  finally, 
if  M'  and  N'  contain  no  variable  then  either  all  substitutions  are  semi-unifiers  of 
M'  <  N'  (including  t)  or  none  are  (excluding  cri).  This  holds  also  in  the  presence  of 
an  additional  term  equation.  Consequently  there  is  no  SEI  with  at  most  one  equation 
and  one  inequality  with  the  same  set  of  semi-unifiers  as  Si  under  the  assumption  that 
A  has  no  functor  with  arity  greater  than  1,  and  we  can  conclude  that  A  must  be 
nonlinear. 

In  view  of  this  proposition,  whenever  working  with  nonlinear  alphabets  we  could  have  defined 
systems  of  equations  and  inequalities  to  consist  of  at  most  one  equation  and  a  set  of  inequalities 
instead  of  sets  of  equations  and  sets  of  sets  of  inequalities  .  We  have  chosen  the  present  formu- 
lation because  it  permits  a  slightly  more  natural  reduction  of  type  inference  to  semi-unification. 
Furthermore,  we  can  give  a  simple  specification  for  computing  most  general  semi-unifiers  by 
rewritings  over  our  systems  of  equations  and  inequalities,  but  not  so  easily  if  we  adopted  the 
simpler  definition. 

Nonetheless,  when  investigating  the  structure  of  semi-unifiers  over  a  nonlinear  alphabet  —  as 
we  shall  do  almost  exclusively  —  we  shall  often  make  use  of  the  possibility  of  "contracting"  sets  of 
equations  and  groups  of  inequalities  into  single  equations  and  single  inequalities.  In  this  vein,  we 
may  often  omit  the  set  brackets  for  singleton  sets;  e.  g.,  ({M  =  jV},{{A/i  <  iVi},{A/2  <  -^2}}) 
may  be  written  simply  (Af  =  N,  {Mi  <  Ni,  A/2  <  N2})  or  even  (Af  =  N,  Mi  <  Ni,M2  <  iVj)- 

3.3      Previous  Work  on  Unification  and  Semi-Unification 

Unification  is  the  problem  (and  informally  also  the  process)  of  finding  solutions  to  term  equations 
of  the  form  ti  —  ti  where  Ti^t^  E  T.  A  solution  of  ti  =  r2  is  a  substitution  cr  such  that 
a{Ti)  =  <t{t2). 

Although  Herbrand  [39]  and  Prawitz  [95]  had  already  used  unification  algorithms,  the  utility 
of  and  interest  in  unification  was  essentially  initiated  by  Robinson's  novel  resolution  principle  in 
theorem  proving  [104]  at  the  heart  of  which  was  a  unification  algorithm. 

Since  then  papers  on  unification  as  well  as  applications  of  unification  have  abounded.  While 
Robinson's  original  algorithm  took  exponential  time  to  compute  the  solutions,  new  representa- 
tions and  algorithms  have  been  found  (see,  e.  g.,  [89]  and  [72])  that  achieve  linear  bounds  on 
the  computation  time,  and  the  unification  problem  has  been  found  to  be  P-complete  [111].  Uni- 
fication is  also  investigated  in  term  algebras  that  are  subject  to  equationaJ  [109]  or  conditional- 
equational  [48]  laws  such  as  associativity,  commutativity,  and  idempotence.  Several  unification 
algorithms  (e.  g.,  [114],  [7],  or  see  [109])  for  such  term  algebras  have  been  presented.  Kapur 
and  Narendran  [55]  showed  that  most  of  these  unification  problems  are  NP-hard.  Huet  [45,  46] 
investigated  third-  and  higher-order  unification  and  proved  that  it  is  recursively  undecidable. 
Goldfarb  [30]  showed  that  second-order  unification  is  also  undecidable. 

Unification  has  permeated  the  field  of  resolution-based  and  even  non-resolution-based  the- 
orem proving  [5].  With  the  identification  of  a  subset  of  First  Order  Logic  that  is  especially 
amenable  to  resolution  theorem  proving  (Horn  Clause  Logic,  c.  f.  [60])  unification  plays  an 
eminent  role  in  logic  programming  languages  such  as  Planner  [40]  and  PROLOG  [123,  113]. 

A  concise  and  clean  treatment  of  the  algebraic  aspects  of  unification  can  be  found  in  [63]  or 
in  [26].  A  recent  survey  on  unification  is  [59]. 

Semi-unification  addresses  the  problem  of  solving  inequalities  of  the  form  ti    ■<  T2  where 
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T"!!  '■2  £  T.   A  substitution  o-  is  a  solution  to  ri  :<  rj  if  there  exists  p  £  S  such  that  p(a(ri))  = 

Whereas  classical  unification  has  numerous  well-known  uses  and  applications,  semi-unification 
and  related  problems  have  apparently  only  recently  received  attention.  The  question  of  finding 
proofs  with  a  minimum  number  of  proof  steps  in  some  classical  logical  systems  can  be  reduced 
to  unification-like  problems,  in  particular  also  to  semi-unification.  This  sort  of  question  has 
been  addressed  by  Parikh  and  Statman  in  the  early  70's  [112]  and,  recently,  by  Krajicek,  Pudlak 
[96,  61]  and  other  proof  theoreticians.  Kapur  ei  al.  [54]  observe  that  solvability  of  a  single  term 
inequality  yields  a  sufficient  condition  for  showing  nontermination  in  term  rewriting  systems, 
and  they  trace  the  history  of  this  connection  back  to  [62].  Semi-unification  [15,  38,  58],  has 
been  shown  to  be  at  the  heart  of  type  checking  in  implicitly  typed  polymorphic  programming 
languages.  Term  inequalities  have  also  been  explored  as  a  partial  order  theory  for  constraint 
logic  programming  [49,  88]  and,  in  general,  as  a  form  of  "partial  order  programming"  [87].  The 
decidability  of  uniform  semi-unification  (see  chapter  3)  is  proved  independently  in  [96],  [54], 
and  [38]  (see  also  section  6.4).  Another  special  case  of  semi-unification,  in  which  any  identifier 
may  occur  at  most  once  in  left-hand  sides  of  term  inequalities,  is  shown  decidable  in  [58].  The 
decidability  of  general  semi-unification  is  currently  open. 
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Chapter  4 

Equivalence  of  Mycroft  Calculus 
and  Semi-Uniflcation 


This  chapter  is  divided  into  two  main  sections  and  one  minor  section.  In  the  first  section  we 
show  that  the  type  inference  problems  in  the  Milner  and  the  Mycroft  Calculi  can  be  reduced 
efficiently  to  semi-unification,  the  problem  of  solving  systems  of  equations  and  inequalities  over 
the  subsumption  preordering  of  first-order  terms.  As  a  by-product  we  also  obtain  the  well-known 
reduction  of  the  Hindley  Calculus  to  unification.  The  main  achievement  of  this  reduction  lies  in 
showing  that  the  prefix-quantified  theory  of  type  correctness  in  the  Milner  and  Mycroft  Calculi 
can  be  completely  embedded  in  semi-unification,  a  strictly  first-order  concept.  Similar  reductions 
to  some  sort  of  inequality  constraints  have  been  found  by  Kfoury  et  at.  [57]  and  by  Leiss  [64]. 
Their  inequalities,  however,  carry  context  conditions  that  stem  from  type  quantification,  whereas 
our  reduction  is  to  inequalities  that  are  completely  "first-order":  there  are  no  implicit  or  explicit 
constraints  on  variables  in  equations  and  inequalities.  This  makes  semi-unification  an  instance  of 
the  "Generalized  Unification  Problem"  [56]  in  that  all  instances  have  trivial  context  conditions, 
namely  none. 

In  the  second  section,  we  present  the  converse  reduction.  In  fact  we  show  that  semi-unification 
can  be  efficiently  reduced  to  the  Flat  Mycroft  Calculus,  a  small  subclass  of  the  general  Mycroft 
Calculus  that  admits  at  most  one  occurrence  of  the  polymorphically  typed  fix-operator  and  no 
let-operator.  This  can  be  interpreted  as  follows: 

The  difficulty  of  type  inference  is  completely  subsumed  in  a  single  polymorphically 
typed  recursive  definition.  Neither  (polymorphic)  let-bindings  nor  nested  let-  and 
fix-bindings  add  anything  to  this  problem  (in  contrast  to  a  statement  by  Mycroft 

[85]). 

This  shows  that  the  Mycroft  Calculus,  the  Flat  Mycroft  Calculus,  and  semi-unification  are 
polynomial-time  equivalent.  This  equiveilence  has  several  consequences.  It  answers  in  the  af- 
firmative a  question  raised  by  Kanellakis  whether  the  PSPACE-hardness  result  for  the  Milner 
Calculus  [53]  can  be  extended  to  the  Flat  Mycroft  Calculus.  Also,  we  obtain  a  log-space  reduc- 
tion of  unification  to  typability  in  the  Hindley  Calculus,  and  as  a  consequence  this  shows  that 
the  Hindley  Calculus  is  P-complete  under  log-space  reductions.  Furthermore,  we  feel  justified 
in  claiming  that  semi-unification  is  the  "right"  combinatorial  problem  to  look  at  when  inves- 
tigating the  algorithmic  properties  of  Mycroft-style  polymorphic  type  inference  since  it  comes 
with  minimal  machinery  (no  quantification,  no  "syntax",  no  scoping),  yet  captures  the  Mycrofl 
Calculus  up  to  polynomial  time. 
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Characterizations  of  type  inference  by  inequality  constraints  involving  quantified  types  in  the 
Second  Order  A-calculus  have  been  given  in  [80,  28].  The  characterization  of  polymorphic  type 
inference  by  semi-unification  in  this  chapter  ha^  also  been  proved,  independently,  by  Kfoury  et 
al.  [58];  in  fact,  they  have  extended  it  to  include  the  Second  Order  A-calculus  limited  to  "rank 
2"-derivations  [65]. 

All  reductions  mentioned  here  refer  to  Karpy-reductions;  i.e.,  input  transformations.  Our 
reductions  from  type  inference  to  semi-unification  preserve  not  only  the  basic  decision  problem 
(typability),  but  also  map  the  structure  of  typings  to  semi-unifiers.  This  connection  is  exploited 
in  chapter  5  to  transfer  results  about  the  structure  of  semi-unifiers  back  to  typings.  In  particular, 
proof  of  existence  of  most  general  semi-unifiers  can  be  interpreted  as  a  simultaneous  "algebraic" 
proof  of  the  principal  typing  property  for  all  of  CH,  DM,  MM,  and  FMM. 
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4.1      Reduction  of  Typability  to  Semi-Unification 

The  reduction  from  the  Mycrofi  Calculus  to  semi-unification  has  already  been  described  in  [38]. 
We  present  here  a  detailed  presentation  of  that  reduction  with  all  details  supplied  that  were 
onutted  in  [38].  We  also  correct  two  minor  errors  in  [38]. 

4.1.1      Syntax  Trees  and  Variable  Occurrences 

The  first  step  of  the  reduction  consists  of  labeling  the  nodes  in  the  syntax  trees  of  A-expressions 
with  monotypes.  For  this  purpose  we  have  to  introduce  some  notions  to  formalize  the  concept 
of  syntax  tree,  binding,  free  and  bound  occurrences,  and  so  on.  The  machinery  necessary  to  do 
this  unfortunately  encumbers  the  overall  exposition  of  the  material  with  heavy  notation  and  a 
multitude  of  definitions.  This  is  mostly  due  to  the  fact  that  the  intuitively  quite  clear  concept 
of  an  (variable  or  term)  occurrence  in  an  expression  is  difficult  to  formalize.  Huet  [47]  defines 
occurrences  in  expressions  by  terms  and  integer  sequences  that  specify  a  "path"  from  the  "root" 
of  that  term  to  a  subterm.  We  use  a  different  presentation  that  makes  the  connection  with  the 
graph-theoretic  image  inherent  in  the  term  "syntax  tree"  precise. 

Definition  5   (Term  graph) 

Let  A  =  {F,  a)  be  a  ranked  alphabet  and  let  V  be  a  set  of  variables  for  A.  A  term  graph 
G  over  A  and  V  is  a  quadruple  {N,Nf,E,L)  where  N  is  a  set,  Np  C  N ,  E  :  Np  —  N' , 
L  :  N  —*  F  U  V ,  and  the  following  conditions  hold. 

1.  L{Np)  C  F  and  (Vn  eNpJe  F)  If  (n)  =  /  =>  \E{n)\  =  a{f); 

2.  L(N  -  Nf)  C  V. 

The  induced   (directed)   graph   of  a  term  graph  G    -    {N,Nf,  E,  L)    is   defined  as  G'    = 
(N{G),  E{G))  where  N(G)  =  N,  E{G)  =  {(n,  n')  :  n  £  A^f,  n'  £  N  \  E(n)  =  (...,  n', . . .)}. 
The  term  graph  G  is  acyclic  if  its  induced  graph  G'  is  acyclic. 

The  elements  of  N  are  called  nodes.  A  node  n  is  a  functor  node  if  n  G  Np,  and  it  is  a  variable 
node  if  n  G  N  -  Np.  The  mapping  E  is  called  an  edge  map,  and  if  E{n)  =  (ni, . .  . ,  n^)  then  n 
is  a  parent  of  all  n,-,  I  <  i  <  k,  and  the  lu  are  the  children  of  n.  For  n  E  N,  L{n)  is  the  label  of 
n;  L  is  called  a  labeling. 

Term  graphs  are  graphical  representations  of  terms  that  encode  the  term/subterm  structure 
explicitly  in  their  edge  maps.  Their  definition  is  necessarily  complicated  since  their  nodes  are 
labeled  and  the  out-edges  of  every  node  are  ordered.  The  digraph  induced  by  a  term  graph  is 
just  the  information  left  if  we  ignore  this  particular  "additional"  structure. 

In  an  acyclic  term  graph  G  =  {N,Nf,L,E)  over  A  and  V  every  node  represents  a  unique 
term.  This  representation  is  given  by  the  following  mapping  [.]  :  ^  — >  T{A,  V). 


,  ,  _  f   a;,  ne  N  -  Np,  L{n)  =  z 


f{[rti] ["*]),     n  e  Np,  L{n)  =  f,  E(n)  =  (m, . . . ,  n*)- 

Let  Ax  =  ({A,  ®,  let,  fix},  {A  i-»  2,  ®  >-►  2,  let  i-»  3,  fix  i->  2}).  Clearly,  Ax  is  an  appropriate 
alphabet  for  representing  A-expressions  as  first-order  terms.  We  can  now  define  what  a  syntax 
tree  for  a  A-expression  is:  a  specied  kind  of  term  graph  over  Ax-  Since  A-expressions  have 
variable-binding  operators  we  also  define  some  concepts  we  shall  need  later. 


'Note  that  [.]  is  implicitly  parameterized  by  G. 
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Definition  6   (Syntax  tree,  free  variable  occurrences,  bound  variable  occurrence  map) 

We  define  the  notions  o/syntax  tree,  its  bindings  and  scopes,  and  its  fiee  variable  occurrences 
(FVO)  and  bound  variable  occurrence  map  (BVOM)  by  simultaneous  induction  on  the  structure 
of  X-ezpressions  e. 

e  =  X  (variable):  Any  one-node  term  graph  T  with  N  =  {n}  and  L{n)  =  x  is  a  syntax  tree  for 
e  with  root  n. 

FVOr     =     {n}, 

BVOMt     =     {}; 

n  is  not  a  binding. 

e  =  Xx.e'i  If  T'  is  a  syntax  tree  for  e'  with  root  n'  and  Tx  is  a  vertex-disjoint  syntax  tree  for  x 
with  root  Ux  (and  no  other  node)  then  the  term  graph  T  that  is  the  union  of  T^  and  T' 
with  an  additional  node  n  and  L{n)  =  A,  E{n)  =  (n^,  n')  is  a  syntax  tree  for  e  with  root  n. 

FVOt     =     FVOt<  -  {n'  e  FVOt-  \  L{n')  =  x}, 
BVOMt     =     BVOMt'  U  {ti,  k-»  {n  e  FVOr'  |  L{n')  =  x}}; 

Tij,  is  a  A-binding;  its  scope  is  N{T')  (the  nodes  in  T' ). 

e  =  e'e":  IfT',  T"  are  vertex  disjoint  syntax  trees  for  (e',  e")  and  with  roots  {n' ,  n"),  respectively, 
then  the  term  graph  T  that  is  the  union  of  T'  and  T"  with  an  additional  node  n  and 
L[n)  —  ®,  E[n)  —  (n',  n")  is  a  syntax  tree  for  e  with  root  n. 

FVOt     =     FVOr'LiFVOr", 
B  VOMt     =     B  VOMt'  U  B  VOMt-  . 

e  =  let  *  =  e'ine":  If  Tx,T',T"  are  vertex  disjoint  syntax  trees  for  (z,e',e")  and  with  roots 
{ux,  n',  n"),  respectively,  then  the  term  graph  T  that  is  the  union  ofT^,  T'  and  T"  with  an 
additional  node  n  and  L{n)  =  let,  E(n)  =  (n,,  n' ,  n")  >j  a  syntax  tree  for  e  with  root  n. 

FVOr     =     FV0T'Li{FV0T" -{n"  EFVOt"  \L{n")=x}), 
BVOMt     =     BVOMt'U  BVOMt- U 

{ux  ^  {n"  e  FVOt-  |  L{n")  =  x}}; 

Ux  is  a  let-binding;  its  scope  is  N(T"). 

e  =  fixz.e':  IfTx,T'  are  vertex  disjoint  syntax  trees  for  x,e'  and  with  roots  nx,n' ,  respectively, 
then  the  term  graph  T  that  is  the  union  of  T,  and  T'  with  an  additional  node  n  and 
L{n)  —  fix,  £(n)  =  (Ux,  "•')  «*  a  syntax  tree  for  e  with  root  n. 

FVOt     =     FVOt'  -  {n  €  FVOt'  \  L(n')  =  x}, 
BVOMt     =     BVOMt- U  {nx  ^  {n' £  FVOt'  \  L{n')  -  x}}; 

Ux  is  a  fix-binding;  its  scope  is  N(T'). 

BVOM  is  a  finite  (single-valued)  mapping  from  nodes  to  finite  sets  of  nodes,  and  thus  it 
is  treated  notationally  as  a  finite  set  of  pairs  n  i— ►  {nj, ...,  n*}.  Since  all  syntax-trees  for  a  A- 
expression  e  are  isomorphic  (i.e.,  there  is  a  bijection  between  nodes  that  transforms  any  one 
syntax  tree  for  e  into  any  other  syntax  tree  for  e)  we  shall  denote  by  T(e)  a  canonical  syntax 
tree  for  e  and  by  N{e)  the  (variable  and  functor)  nodes  in  T(e). 

It  is  easy  to  see  that  our  syntax  trees  are  indeed  trees:  The  induced  digraph  T^(e)  is  acyclic, 
and  every  node  has  at  most  one  inedge,  and  exactly  one  node  —  the  root  —  has  no  inedge. 
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4.1.2      Syntax-Oriented  Type  Inference  Systems 

The  inference  systems  that  describe  our  typing  calculi  are  not  syntax-oriented.  This  means  that 
for  a  given  expression  e  there  may  be  several  proof  steps  in  a  derivation  that  are  not  compositional 
in  terms  of  the  syntax  of  A-expressions.  This  is  solely  due  to  the  rules  (INST)  and  (GEN)  (see 
Figure  2.1  in  chapter  2)  since  proof  steps  involving  any  one  of  these  rules  do  not  change  the 
expression  in  a  typing.  In  a  syntax-oriented  system  a  derivation  for  expression  e  has  essentially 
the  same  tree  structure  as  a  syntax  tree  for  e.  The  advantage  of  a  syntax-oriented  inference 
system  is  that  we  can  think  of  a  derivation  for  e  as  an  attribution  of  the  syntax  tree  T(e). 

In  this  subsection  we  present  equivalent  syntax-oriented  type  inference  systems  for  CH,  DM, 
MM,  and  FMM.  In  the  next  subsection  we  show  how  every  derivation  in  these  syntax-oriented 
inference  systems  can  be  translated  into  an  attribution  of  T(e)  that  satisfies  certain  properties, 
and  vice  versa. 

The  syntax-oriented  versions  of  CH,  DM,  MM,  FMM  will  be  denoted  by  a  "prime":  CH', 
DM',  MM',  FMM'.  In  general  if  X  is  any  one  of  CH,  DM,  MM,  FMM,  then  X'  is  the  corresponding 
syntax-oriented  version  of  X.  The  list  of  all  axiom  and  rule  schemes  that  occur  in  the  syntax- 
oriented  inference  systems  is  given  in  Figure  4.1.  Table  4.2  shows  which  of  the  axioms  and  rules 
are  present  in  which  syntax-oriented  calculus,  and  which  ones  are  not. 

For  completeness  we  have  included  those  rules  that  are  unchanged  from  the  original  inference 
systems.  Changed  axioms  and  rules  are  marked  with  a  "prime"  (').  We  have  taken  some  liberties 
in  our  notation;  in  particular  the  sequence  a  =  ai  .  ..a^  may  also  be  regarded  as  a  set. 

Note  that  the  syntax-oriented  inference  systems  do  not  contain  either  (INST)  or  (GEN).  The 
ability  to  instantiate  polytypes  to  monotypes  has  been  included  into  the  new  axiom,  (TAUT'), 
for  variables;  and  the  ability  of  (GEN)  to  form  polytypes  is  localized  in  applications  of  the 
polymorphic  typing  rules  (LET-P')  and  (FIX-P').  An  additional  benefit  of  the  syntax-oriented 
versions  is  that  derivable  typings  are  exclusively  (  f  the  form  A  D  e  :  t  where  r  is  a  monot.  pe. 
This  is  one  step  in  the  direction  of  eliminating  constraints  involving  quantified  types.  Somewhat 
paradoxically  this  corresponds  to  traversing  chronologically  backwards  the  evolution  of  the  Milner 
Calculus  from  the  type  system  with  explicitly  quantified  type  expressions  [23]  to  Milner's  original 
"implicit"  distinction  of  generic  and  nongeneric  type  variables  [76]. 

We  shall  now  prove  that  the  new  inference  systems  are  indeed  no  weaker  (or  stronger)  than 
the  original  systems.  First  we  will  need  a  technical  proposition,  though. 

Proposition  2   Let  X  be  CH,  DM,  MM,  or  FMM.  For  any  type  environment  A,  \-expression 
e,  type  expressions  cr,  a' ,  and  type  variables  a  =  qi  . . .  Qt ,  q'  =  a'j  . . .  q^  uie  have 

1.  X\-  A-^e:  'ia.a-  O  X\-  AZ)e:  "ia' .(T[a' fa] 

2.  XV-  A{x  :  Va.or}  D  e  :  (/  O  X  h  A{x  :  Va'-o-fa'/a]}  D  e  :  a-' 

Theorem  5  Let  X  =  CH,  DM,  MM,  or  FMM.  For  all  type  environments  A,  \-ezpressions  e, 
type  variables  a  =  ai  . .  .otk  "o£  free  in  A  and  monotypes  t  vue  have 

X\-  ADe:  "^a.T  <;>  X' \-  A  D  e  :  t 

Corollary  3   For  any  e  £  A,  e  is  typable  in  X  if  and  only  if  it  is  typable  in  X'. 

For  X  =  DM  this  theorem  is  similar  to  theorem  2.1  in  [16];  and  for  X  =  MM  it  is  almost 
identical  to  proposition  2.1  in  [57].  Note,  however,  that  it  is  technically  a  little  bit  stronger  since 
it  states  that  the  type  of  e  is  literally  identical  in  its  quantifier-free  part,  without  necessitating  a 
renaming  of  type  variables.  Similar  proofs,  localizing  applications  of  the  INST  rule  at  the  leaves 
(variables)  of  A-expressions  can  be  found  in  [79]  and  [9]. 
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Let  A  range  over  type  environments;  x  over  variables;  e,e'  over  A-expressions;  q  over  sequences 
of  type  variables;  r,  r'  over  monotypes,  and  f  over  sequences  of  monotypes.  The  following  are 
type  inference  axiom  and  rule  schemes  for  CH',  DH',  MM',  FMM'. 


Name 

Axiom/rule 

(TAUT') 

A{x  :  Vq.t}  D  X  :  T[f/d] 

(ABS) 

A{x  ■.T'}De:T 

A  D  Xx.e  -.t'-^t 

(APPL) 

ADeir'  ^r 
ADe'-.T' 

A  D  (ee')  :  r 

(LET-M) 

AD  e  :  T 

A{x  :  r}  D  e'  :  r' 

A  D  letx  -  eine'  :  Z 

(LET-P") 

AD  e  IT 

A{x  -.Wa.r}  D  e'  :  t' 

[a  not  free  in  .4) 

A  D  let*  =  eine'  :  r' 

(FIX-M) 

A{x  :  t}d  e:r 

A  D  fixi.e  :  r 

(FIX-P') 

A{x  :  'ia.r}  D  e  :  r 

(a  not  free  in  A) 

A  D  fixas.e  :  T[f/d] 

Table  4.1:  Syntax-oriented  axioms  and  rules 


Axiom/rule 

CH' 

DM' 

MM' 

FMM' 

TAUT' 

V 

V 

V 

s/ 

APPL 

v/ 

V 

v/ 

V 

ABS 

V 

V 

V 

s/ 

LET-M 

V 

LET-P" 

V 

s/ 

FIX-M 

V 

V 

FIX-P' 

v 

y 

The  mark  yj  indicates  the  corresponding  axiom/rule  is  present  in  the  calculus  in  whose  column 
it  appears;  blank  space  means  it  is  not  included.  The  Flat  Mycroft  Calculus  is  restricted  to 
A-exptessions  with  no  let-operator  and  with  only  one  occurrence  of  a  fix-operator,  which  must 
occur  at  top-level. 

Table  4.2:  The  syntax-oriented  versions  of  the  Hindley,  Milner,  Mycroft,  and  Flat  Mycroft  type 
inference  calculi 
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Somewhat  unsurprisingly,  the  theorem  is  a  consequence  of  a  stronger  lemma  that  can  be 
shown  by  structural  induction  on  derivations. 

Lemma  4   Let  X  =  CH,  DM,  MM,  or  FMM.  For  all  type  environments  A,  \-expressions  e,  type 
variables  a  =  a-i  .  .  .Ok,  a  not  free  in  A,  and  monotypes  r  we  have 

X  \-  ADe:  Va.r  O  (Vf  S  A/*)  X'  \-  AD  e:  T[f/a] 

Proof: 

=>:   We  proceed  by  structural  Induction  on  MM-derivations.    The  other  cases,  CH 
and  DM,  are  simplifications  of  this  proof;  FMM  is  a  subcase  of  MM. 

(TAUT)  If  we   have   a   trivial   derivation   involving   only    (TAUT)   in    MM, 
A{x  :  "ia.r}  D  z  :  ^a.T 
then,  by  (TAUT')  in  MM'  we  have 
A{z  :  Va.r}  D  x  :  T[f/a] 
(ABSTR),  (APPL)  Trivial. 

(INST)  li  A  D  e  :  Vqj  . . .  Qik.7-[ri/Qi]  is  proved  in  MM  invoking  the  (INST) 
rule, 
A  D  e  :  'ia.T 

Ad  e  :  Vq2  ...Q|,.r[rx/Qi] 
then,  since  we  may  assume  by  proposition  2  that  a\  is  free  in  A  we  have, 
by  the  induction  hypothesis,  that  the  conclusion 

ADe:  r[ri/Qi][r2/a2, . . . ,  nlotk] 
is  derivable  in  MM',  for  any  r2, . . . ,  th  since 

'■[n/Qi][T2/a2,...,T),/Qi]  =  r[r(/Qi,...,r^/Q*] 

for  some  t^,  . .  . ,  r^. 
(GEN) 

\{  A  D  e  :  Vd.r  is  proved  in  MM  with  the  (GEN)  rule, 
A  D  e  :  Vq2  . . .  a^.r 
(qi  not  free  in  A) 
A  D  e  :  Vof.r 
then,  since  a  is  not  free  in  A  by  assumption,  we  have  that  A  D  e  :  T[f/d]  is 
provable  in  MM',  by  the  induction  hypothesis. 
(LET-P)  Assume  A  D  let  x  =  e  in  e'  :  Vq'./  is  proved  with  the  (LET)  rule; 
i.e., 
A  D  e  :  'ia.r 
A{x  :  "^a.T}  D  e'  :  Vq"''.?-' 
ADletx=e\ne'  :  Vq'./ 

In  view  of  proposition  2  we  may  assume,  w.l.o.g.,  that  q'  is  not  free  in  A  and 
A{x  :  Va.r}.  By  induction  assumption  we  have,  for  any  -P ,  that  A  D  e  '■  r 
and  A{x  :  "ia.T}  D  e'  :  T'lr'/a']  are  derivable  in  MM'.  Consequently, 

ADe  -.T 

A{x  :  ya.r}  D  e'  :  T'[T'/a'] 
(since  a  not  free  in  A) 

AD\etx=eine'  :  T'lr'/a']  (LET-P') 
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(FIX-P) 

Assume  that  ,4  D  fixz.e  :  Vq.t  is  derivable  in  MM  by  the  (FIX-P)  rule; 
that  is, 

A{x  :  'ia.r}  D  e  :  "ia.r 

A  D  Rxx.e  :  Va.r 
W.l.o.g.   (proposition  2)  we  may  assume  that  a  is  not  free  in  A  and  A{z  : 
Va.r}.    By  the  induction  hypothesis  we  know  that  A{x  :  Vq.t}  D  e  :  r  is 
derivable  in  MM',  and  consequently  we  get 

A{x  :  Vq.t}  D  e  :  t 

(since  a  is  not  free  in  A) 

A  D  fixz.e  :  r[f/Q]  (FIX-P') 

It  is  sufficient  to  show  X'hADe:  T^X\-AZ}e:  Va.r.  We  shall  prove  that 
every  axiom  and  rule  in  MM'  is  derivable  in  MM.  The  proof  for  X  =  DM  and 
CH  is  similar. 

Note  that  it  is  easy  (but  not  completely  trivial)  to  show  that 
(INST*)      .4  D  e  :  yg.r 
ADe  :  r[f/d] 
and 
(GEN)*      .4  D  e  :  r 

(q  not  free  in  A) 
A  D  e  :  "ia.T 
are  derivable  rule  schemes  in  MM. 
(TAUT')   Let  MM'  h  A{x  :  Va.r}  D  x  :  T[f/a].   We  have  the  following  proof 

tree  in  MM: 

A{j  :  Va.r}  D  x  :  \/a.T  (TAUT) 
~A{x  :  "rfd.T}  D  X  :  r[f/dj  (INST*) 
(APPL),  (ABS)  Trivial. 
(LET-P')  In  MM'  we  have 
AD  e  -.T 

A{x  :VQ.r}  D  e'  :  r' 
(a  not  free  in  A) 
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A  D  let  z  =  e  in  e'  :  r' 
and  in  MM 
AD  e  :t 

(a  not  free  in  A) 
^  De  :  Va.r  (GEN*)      A{z  :  Va.r}  D  e' :  / 
A  D  let  z  =  e  in  e'  :  r'  (LET-P) 
(FIX-P')  In  MM'  we  have  the  rule 
A{z  :  '^a.r}  D  e  :  t 
(a  not  free  in  A) 
A  D  fixz.e  :  T[f/d] 
and  in  MM 
A{x  :  'ia.r}  D  e  :  t 

(g  not  free  in  A) 

~^{x  :  "ia.r}  De  :  Wa.T  (GEN^ 

A  D  fixz.e  :  Va.r  (FIX-P) 

A  D  fixz.e  :  T[f/d]  (INST*) 

Proof:  (Proof  of  theorem) 
Immediate  from  Lemma  4. 
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4.1.3      Consistently  Labeled  Syntax  Trees 

In  this  section  we  define  (type)  labeled  syntax-trees  and  a  notion  of  consistency  of  such  labelings. 
We  shall  prove  that  consistently  labeled  syntax-trees  and  derivations  in  the  syntax-oriented 
versions  of  our  type  inference  systems  are  in  a  one-to-one  relation. 

Definition  7  (Typed  syntax  tree,  generic/nongeneric  type  variable  occurrences,  well-typed  syn- 
tax tree) 

A  typed  syntax  tree  T^  is  a  syntax  tree  T  with  a  function  r  :  N{T)  -»  M,  called  a  type 
labeling. 

For  a  given  type  environment  A,  expression  e  with  syntax  tree  T  =  T(e),  and  type  labeling  t 
for  T,  we  say  a  type  variable  a  is  nongeneric  at  node  n"  in  T  if  n"  is  in  the  scope  of  a  X-binding 
n,  E{n)  -  {n^,n'),  and  a  occurs  (free)  in  T{nt);  or  if  a  occurs  free  in  A.  If  a  is  not  nongeneric 
at  n",  it  is  generic  at  n" .  NGTV(n")  denotes  the  set  of  all  nongeneric  type  variables  at  n" ,  and 
GTV{n")  IS  TV  -  NGTV{n")} 

For  fixed  syntax-tree  T  -  [N,Nf,E,L)  of  X-expression  e  and  type  labeling  r  the  labeled 
syntax  tree  T^  is  (MM-)consistently  labeled  under  type  assignment  A  if  tt  satisfies  the  following 
properties. 

1.  (Local  conditions) 
For  alln  G  Np, 

(a)  L{n)  =  X,  E{n)  =  {n',  n")  ^  T{n)  =  T(n')  -^  T{n") 

(b)  L{n)  =  ®,E{n)  =  {n',n")  ^  T{n')  =  T{n")  ->  r(n) 

(c)  L{n)  =  let,  E{n)  =  (n',  n",  n'")  =>  T(n')  =  T{n")  A  T{n"')  =  T{n) 

(d)  L[n)  =  fix,  E{n)  =  (u',  n")  =>  r(n')  =  T{n")  A  {3R)  iJ|Grv(u)(r(n"))  =  T{n) 

2.  (Scoping  conditions) 
For  all  n  E  N, 

(a)  if  n  is  a  X-binding  then 

(Vn'  e  BVOMrin)  T{n)  =  r(7i') 

(b)  if  n  is  a  \et-binding  then 

{'in'  e  BV0MT{n)){3R)  R\gtv in){r{n))  =  T{n') 

(c)  if  n  is  a  ^x-binding  then 

(Vn'  e  BV0MT{n)){3R)  R\gtv in){r{n))  =  T{n') 

3.  (Context  condition) 

For  all  n  £  FVOr,  if  L(n)  =  x  and  A{x)  =  'ia.r'  then  {3R)  i?|a(r')  =  T{n). 

The  labeled  tree  T^  is  DM-consistently  labeled  if  it  is  MM-consistently  labeled  and  the  con- 
ditions 

For  all  ne  N, 

•  L{n)  =  fix,  E{n)  =  (n',  n")  =>  T{n')  =  T{n")  =  T{n) 

•  if  n  is  a  Rx.-binding  then 

(Vn'  e  BVOMrin))  T{n)  =  T{n') 


2 Of  course,  NGTV{n")  and  GTV{n")  are  parameterized  over  A,  T(e),  and  r. 
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(a'  ->  b')  ->  b 
Figure  4.1:  A  consistently  labeled  syntax  tree 


are  satisfied.  T^  is  CH-consistentiy  labeled  if  it  is  DM-consistently  labeled  and  additionally  the 
following  constraint  is  satisfied. 
For  all  ne  N, 

•   if  n  is  a  let-binding  then 

(Vn'  G  BVOMrin))  T{n)  ^  T{n') 

Let  X  =  CH,  DM,  or  MM.  An  expression  e  is  X-consistently  labelable  :/  there  is  a  type 
labeling  r  for  syntax  tree  T  —  T(e)  sv,ch  that  T^  is  X-consistently  labeled. 

Consider  the  A-expression  g[i\x.  f.Xx.f  f)  in  the  type  environment  {g  :  Wa.{a  — »  a)  — ►  a}.  A 
consistently  labeled  syntax  tree,  T,  for  e  is  presented  in  Figure  4.1. 

The  syntax  tree  in  the  example  has  nine  nodes,  nl, . . . ,  n9.  Its  only  free  variable  occurrence 
is  the  node  ti2;  that  is,  FVOy  =  {i2}.  The  node  nA  is  a  fix-binding,  and  n5  is  a  A-binding. 
The  bound  variable  occurrence  map  associates  with  each  one  of  these  bindings  the  set  of  all 
their  applied  occurrences:  BVOMt  =  {n4  •—»  {n8,n9},n5  h-»  0}.  Since  A  contains  no  free  type 
variable,  all  type  variables  are  generic  at  nodes  nl,  n2,  n3, 7i4,  nS,  and  n6.  Since  a  occurs  in  the 
type  labeling  of  7i6,  a  is  nongeneric  at  nodesn7,n.8,  and  ti9;  all  other  type  variables  are  generic  at 
nl,  nS]  Ti9.  Note  that  T  is  consistently  labeled  since  all  the  conditions  in  definition  7  are  satisfied; 
in  particular,  the  types  at  the  applied  occurrences  of  /  nS  and  n9  are  substitution  instances  of 
the  type  at  the  fix-binding  of  /,  node  n4,  and  a,  b  are  generic  type  variables  at  n4. 

The  following  theorem  shows  that  derivations  in  the  syntax-oriented  type  inference  systems 
are  characterized  by  corresponding  consistently  labeled  syntax  trees  and  vice  versa. 


Theorem  6  Let  X  =  CH,  DM,  MM.  For  A,e,r' ,  X'  \-  A  ^  e 
consistently  labeled  syntax  tree  for  e  with  root  n,  and  T(n)  =  r'. 


t'  <^  (3r)  T{ey   is  an  X- 
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Proof: 

We  shall  only  give  the  proof  for  X  =  MM.  The  modifications  for  the  other  typing 
disciplines  are  trivial. 

(=>)   Let  .4o,eo  be  fixed.    Let  To  =  T(eo)  be  a  syntax  tree  for  eo  with  root  no,  as 
usual.    Let  Pq  be  an  MM'-proof  tree  for  Ao  D  eo  :  t^.    Since  MM'  is  syntax- 
directed,  Po  and  To  are  isomorphic,  and  consequently  we  can  define  a  mapping 
To  :  N{To)  -^  M  by 
To{n,  A  Z)  e  :  t)  ~ 


case  e  = 

z  (variable): 


Az.e":  for  E{n)  =  in',n"),  and 
A{i  :  t'}  D  e"  :  r" 
^  D  e  :  r 
in  the  proof  tree  Pq, 


{n  H- »  r} 


{n  ^T,n'  ^  r'}  U  ro(n",  A{z  :  r'}  D  e"  :  r") 


e'e":  for  E(7i)  =  {n',n"),  and 
.4  D  e'  :  r* 

^  D  e"  :  r" 

.4  D  (e'e")  :  r 
in  the  proof  tree  Poi 


{n  H^  r}  U  ro(n',  A  D  e'  :  r')  U  ro(7i",  A  D  e"  :  r") 


let  z  =  e"  in  e'":  for  £(71)  =  (n',  n",  n'"),  and 
ADe"  :  r" 
A{z  :  Va.r"}  D  e'"  :  r 
.4  D  let  z  =  e"  in  e'"  :  r 
in  the  proof  tree  Pq, 

{n  ^r,n'^  r"}  U  ro(n",  A  D  e"  :  r")  U  ro(n"',  .4{z  :  ^a.r"}  D  e"  :  r) 

fixz.e":  for  E{n)  =  {n',n"),  and 
A{z  :  "ia.T"}  Pe-.T" 
A  D  fixz.e  :  r 
in  the  proof  tree  Po, 

{n  K-  r,  n'  1-.  r"}  U  To(n",  A{z  :  ^d.T"}  D  e  :  t") 

and,  furthermore,  ro(no,i4o  D  e©  :  Tq)  =  Tq. 

Now,  it  is  easy  to  check  that  Tj°  is  an  MM-consistently  labeled  syntax  tree  with 
root  no  *nti  ■''0(10)  =  ''o' 
(<=)   Let  Ao,eo  be  fixed.   Let  T^"  be  an  MM-consistently  labeled  syntax  tree  for  eo 
with  root  no-  There  is  an  assignment  A  from  N{To)  to  type  environments  that 
satisfies  the  following  properties. 
A{no)  =  Ao  and  for  all  n  G  iV(To), 
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•  if  i(n)  =  X,E{n)  =  {n',n"),  then 
A{n')  =  A{n")  =  A{n){L(n')  :  ro(n')} 

•  if  L(n)  =  ®,E{n)  =  {n',n"),  then 
A{n')  =  A{n")  =  A{n) 

•  if  L{n)  =  let,  E{n)  =  (n',  n",  n'"),  then 
>t(Ti')  =  ^(n'")  =  A{n){Lin')  :  Vd(n').ro(n')} 

•  if  I(n)  =  fix,  £(n)  z;  (n',n"),  then 

A{n')  =  A{n")  =  A{n){L{n')  :  Vd(n').ro(n')} 

where  a[n')  consists  of  all  the  generic  variables  at  node  n'  that  occur  in  To{n'). 

Now  it  is  straightforward,  by  induction  on  the  syntax  of  eo,  to  show  that  MM'  H 
A{n)  D  [n]  :  To(n)  for  all  n  G  N{To). 

The  proof  shows  that  actually  something  even  stronger  is  true.  We  can  start  with  a  consis- 
tently labeled  syntax  tree  for  e,  construct  a  proof  tree  for  e  from  it  via  the  encoding  A,  and  then 
generate  a  consistent  labeling  for  e  again  via  tq  from  the  proof  tree.  This  labeling  turns  out  to 
be  the  same  one  we  started  out  with. 


4.1.4      Extraction  of  Equations  and  Inequalities 

In  this  section  we  make  the  connection  between  the  consistent  tree  labeling  characterization  and 
solving  a  system  of  equations  and  inequalities  (SEI)  precise. 

The  tree  labeling  characterization  gives  us  a  different  (yet  in  principle  familiar)  formulation 
of  type  inference  problems.  If  we  initially  associate  a  distinct  type  variable  a„  with  every 
node  n  in  T,  then  the  tree  characterization  gives  us  a  collection  of  simultaneous  constraints  of 
equational  form,  such  as  a„  =  a„i  — >  q„'i  and,  essentially,  of  inequationed  form  On"  <  oin-  A 
connection  between  consistent  labelings  and  semi-unification  seems  close  at  hand.  We  have  to 
be  a  little  bit  careful,  though,  since  the  quotient  substitutions  in  the  inequational  constraints 
of  consistent  labelings  carry  context  conditions:  Their  domains  are  restricted  to  generic  type 
variables,  the  collection  of  which  in  turn  is  a  function  of  the  position  of  the  node  in  the  syntax 
tree  where  the  constraint  has  to  hold.  We  could  always  keep  track  of  such  context  conditions  in 
the  form  of  conditional  inequalities  (GTV(Ti))a„"  <  a„  —  this  is  essentially  the  "Generalized 
Unification  Problem"  of  [58]  —  but  this  is  not  necessary.  As  we  shall  see  in  this  subsection,  the 
context  conditions  can  be  encoded  efficiently  in  terms  of  additional  (unconditional)  inequalities 
the  specific  nature  of  which  captures  precisely  the  fact  that  the  set  of  generic  type  variables  is 
generally  different  from  node  to  node  in  the  same  syntax  tree.  This  will  indeed  lead  us  to  a 
reduction  of  consistent  labeling  to  semi-unification. 

We  shall  consider  a  small,  but  instructive  example  due  to  Kfoury  to  see  that  it  would  be 
wrong  in  DM  and  MM  to  naively  label  a  syntax  tree  with  distinct  type  variables  and  then  to 
collect  equations  from  equality  constraints  in  the  consistent  labeling  definition  and  inequalities  of 
the  form  a„'>  <  On  when  the  consistent  labeling  constraint  reads,  say,  R\gtv {n)i'''i^"))  —  ■''(") 
(see  constraint  Id). 

Let  Co  =  Ay.let  /  —  Xx.{xy)  in  (//).  A  syntax  tree  To  =  T(eo)  with  nodes  nl, . . . ,  nl2  is 
given  in  Figure  4.2.  By  proceeding  in  the  naive  manner  outlined  above  we  associate  distinct 
type  variables  ani.  ■  •  • .  Oniz  with  each  node  and  collect  constraints  for  an  MM-consistent  (or 
DM-consistent)  labeling.  The  equations  and  inequalities  thus  constructed  are  displayed  in  Table 
4.3. 

This  SEI  is  solvable.  For  example,  the  substitution 

35 


nil 


Figure  4.2:  An  untypable  expression 


ani 

= 

Qn2  — ♦  an9 

Oln9 

= 

Onl2 

OCni 

^ 

Ctnfl 

Oln3 

= 

a„8 

Qn4 

= 

OnS 

a„8 

= 

an4  — »  a„7 

an3 

< 

OnlO 

OnS 

= 

an8  — »  an7 

an3 

< 

Onll 

OnlO 

= 

Onll  -►  Olnll 

Table  4.3:  Incorrect  SEI  for  untypable  example 
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S      =       {Ofni  >-►  (ar„J  -♦  a„9),Qn3  '-'  ((an2  -*  On?)  — '  «n7), 

One  '-'  an2.  a„8  •-►  ((an2  "»  ^n?)  —  a„7), 
QnlO  !-•  (((A  — ►  Olnd)  — •  Ons)  -»  OCnd), 
Onll  !-►  ((/3l   — ►  One)  — '  an9)iQnl2  >-*  Ong} 

where  Pi  is  a  "new"  type  variable  not  occurring  anywhere  in  the  original  constraints  is  a  semi- 
unifier,  in  fact  the  most  general  in  a  sense  to  be  made  precise  in  chapter  5.  Unfortunately, 
however,  Bq  is  untypable.  This  can  be  seen  by  looking  at  the  quotient  substitutions  for  the 
solution  S.  The  quotient  substitution  for  the  first  inequality  is  Ri  =  {a„2  — >  {0i  — ►  cing),Qn7  — » 
Q„9}  and  for  the  second  inequality  it  is  R3  —  {ctnj  — *  P\.,oinT  — »  Qns}-  Since  a„2  is  the  type 
of  the  left  node  of  the  A-binding  nl,  the  type  variable  q„j  is  nongeneric  at  nodes  nlO  and  nil 
and  consequently  our  quotient  substitutions  violate  the  stipulation  that  their  domain  may  only 
include  generic  type  variables.  Because  of  the  two  occurrences  of /3i  there  is  no  way  of  simply 
"making"  q„2  equal  to  both  Ri{an2)  and  R2(an2)-  Since  every  other  solution  of  the  above 
constraints  must  be  a  substitution  refinement  of  5  (the  technical  details  are  in  chapter  5)  there 
is  also  no  way  of  doing  this  for  any  other  substitution.  The  expression  cq  is  not  consistently 
labelable,  and  consequently  it  is  untypable  (in  the  Mycroft  Calculus  and  thus,  trivially,  in  the 
Hindley  and  Milner  Calculi). 

If  we  can  somehow  encode  with  equations  and  inequalities  the  context  constraint  that  certain 
type  variables  (the  nongeneric  ones)  may  not  be  instantiated  to  other  variables  or  terms  by 
quotient  substitutions  of  candidate  solutions  then  a  consistent  labeling  can  still  be  reduced  to 
pure  semi-unification.  This  is  indeed  possible  and  actually  quite  simple^  Consider  an  inequality 
Ti  <  Tj  containing  variable  a.  Notice  that  the  quotient  substitution  R  of  any  uniform  semi- 
unifier  S  of  {ri  <  T2,a  <  a}  will  not  instantiate  any  of  the  type  variables  occurring  in  S{a). 
This  device  makes  it  possible  for  a  solution  to  instantiate  the  variable  a,  but  it  "protects"  the 
resulting  term  from  being  instantiated  any  further  by  a  quotient  substitution. 

With  this  insight  we  can  now  adjoin  the  inequality  q„2  <  ^ni  with  each  of  our  two  inequality 
constraints  in  Table  4.3,  thus  forming  small  groups  of  inequalities  that  have  to  share  the  same 
quotient  substitution.*  The  resulting  SEI  is  indeed  unsolvable,  in  correspondence  to  the  fact  that 
Co  is  not  consistently  labelable.  The  technical  details  showing  correctness  of  the  transformation 
that 

1.  collects  equational  and  inequality  constraints  in  a  "naive"  manner  (in  accordance  with  the 
consistent  labeling  requirements),  and 

2.  adjoins  inequeilities  of  the  form  a  <  a  for  every  "naively"  collected  inequality  that  arises 
from  a  node  that  is  in  the  scope  of  a  A-binding, 

is  presented  below. 

Definition  8  Let  To  —  T'(eo)  he  a  syntax  tree  for  e©  with  root  no,  and  let  t  :  N{To)  — »  TV  be 
an  injective  mapping  from  the  nodes  in  To  to  the  set  of  type  variables.  The  canonical  system  of 
equations  and  inequalities  5J5/,(co)  =  SEI^'^{eo)  is  {€,!)  where 

e     =     {t{n)  =  t{n')  -  t{n")  :  n,  n' ,  n"  £  N{To)  \  L{n)  =  A,  E{n)  =  (n',  n")} 


''But  it  has  been  overlooked  by  otheis  approaching  the  same  problem  (see  [57]). 

^This  is  the  reason  why  we  opted  to  introduce  SEI's  with  groups  of  inequalities  instead  of  simple  inequalities. 
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U      {t(n')  =  t{n")  ->  i{n)  :  n,  n' ,  n"  £  iV(To)  |  i(n)  =  @,  iF(n)  =  (n',  u")} 
U      {t{n')  =  <(n"),  ((n'")  =  t{n)  :  n,  n',  n",  n'"  6  Ar(To)  |  L(n)  =  let, 

£;(n)  =  (n',n",n"')} 
U      {i(n')  =  t(n")  :  n,  n' ,  n"  £  N{To)  \  L{n)  =  fix,  E(n)  =  (n',  n")} 
U      {<(n')  =  f(n"):n'A-6.n(imff,n"e5l^OMr„(n.')} 

/or  every  let-  or  fix-fcindm^  n,  n'  £  5yOAfT„(Ti),  u)e  ie^ne 

/„,„,     =     {t(n)<<(n')}uO"<t":t"eiVGTrH}; 

/or  n,  n',  n"  £  iV(To)  iuc/i  that  L{n)  =  fix  ancf  E{n)  =  (n',  n"), 

^Jn"     =     {K^")  <  <('^)}  LJ  {'"  <  <"  :  <"  G  ^GTF(u)}; 

and  finally, 

!={/„,„,  :  (n,n')  £  BFOMr(.„)}  U  {/fi^„  :  n,n',n"  £  iV(To)|i(u)  =  fix,  £;(7i)  =  (n',n")}. 

We  shall  usuaUy  drop  the  subscript  &om  SEI^'^{e)  and  simply  write  SEl"'^{e)  since  the 
specific  nature  of  i  is  obviously  irrelevant.  In  a  similar  fashion  we  can  define  SEI  (e)  for  X  = 
CH,  DM. 

Theorem  7  Let  X  =  CH,  DM,  or  MM;  let  Tq  -  T(eo)  be  a  syntax-tree  for  eo  with  root  no;  let 
t  :  N(To)  — »  TV  be  an  arbitrary  infective  map;  and  let  t  be  an  arbitrary  monotype. 

There  «  an  X-consistent  type  labeling  Tq  for  To,  with  ro(no)  =  r,  if  and  only  if  there  is  a 
solution  S  of  SEI^{e)  such  that  S{i{no))  =  r. 

Proof: 

As  always  we  shall  only  consider  the  case  X  =  MM. 

(=>)  Assume  I^°  is  a  well-typed  syntax  tree  for  Cq  such  that  To(rio)  =  r.  Let 
SEIt{eo)  =  (£,T)  as  defined  above.  Define  5  =  {^(ti)  h-  To(n)  :  n  £  N(To)}. 
By  assumption,  S(t(no))  =  r.  Furthermore  it  is  easy  to  see,  by  checking  all  four 
major  cases,  that  all  equations  in  £  are  satisfied.  Now  consider  /„,„>  where  n  is 
a  let-  or  fix-binding  and  n'  is  a  bound  occurrence  of  n.  We  have 

5(/„,„.)     =     {S{t{n))  <  S{t{n'))}  U  {S{t")  <  S(t")  :  t"  £  NGTV{n)}    ' 
=     {ro(n)  <  To(n')}  U  {To(n")  <  ro(n")  : 
FV{To{n"))  C  NGTV{n)} 

Since  T^'  is  consistently  labeled  there  is  a  substitution  Rn,n'  such  that 

-Rn,n'|GTV(n)(To(n))  =  ro(n'). 

This  implies  that  Rn,n'\GTV(n)i'ro{n"))  =  Tb(n")  for  all  n"  such  that 
FV{To{n"))  C  NGTV{n).  A  similar  analysis  can  be  performed  for  every  /"*.. 
This  shows  that  S  is  a  (proper,  nonuniform)  semi-unifier  of  5£'7t(eo). 

(<=)  Assume  5  is  a  solution  of  S£^/e(eo)  such  that  5(<(no))  =  t.  Let  To  be  a 
syntax  tree  for  cq  with  root  no  as  always.  Define  ro(n)  =  S{t{n)),n  £  N{To). 
Clearly,  To(no)  =  r  by  assumption.  We  shall  show  that  T(eo)''°  is  a  well-typed 
syntax  tree.  Since  5  is  a  solution  of  5£^/,(eo)  —  {£,T)  all  equalities  in  S{£)  are 
satisfied  and  it  is  easy  to  see  that  all  equational  constraints  hold  for  T{eoY''  to 
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be  well-typed.  Observe  that,  by  definition  of  NGTV,  the  set  of  non-generic  type 
variables  at  a  node  n  in  TJ°  is  exactly  the  set  of  type  variables  occurring  in  any 
To{n')  where  n'  is  a  A-binding  whose  scope  contains  n.  We  also  know  that  for 
any  7„  „■  G  J  there  is  a  quotient  substitution  Rn,n'  such  that 


i?„.„,(S(i(n)))     =     S{t{n')) 

i?„,„.(S(t"))         =     S(t"),        t"eNGTV{n). 


This  implies 


Rn.n'iMn))       =     ro(n') 

^,n'(''"o(i"))     =     To{n"),     n"is  a  A-binding  whose  scope  contains  n. 

By  the  observation  above  we  can  conclude  that  i^n.n'  's  the  iden- 
tity on  NGTV{n),  which  shows  that  Rn^n'  =  Rn,n'\GTV{n)  ^nd  '•^us 
Rn,n'\GTV [n)i'''o{n))  —  To(7i').  A  similar  argument  holds  for  /„  Ji  £  X. 

It  is  obvious  that  analogous  transformations,  only  with  "more"  equational  constraints  and 
fewer  inequality  constraints,  can  be  performed  that  give  reductions  from  DM-consistent  label- 
ing, respectively  CH-consistent  labeling,  to  semi-unification.  Actually,  in  the  Hindley  Calculus 
there  is  no  problem  with  context  conditions  on  inequalities  in  labeled  syntax  trees  since  there 
are  no  inequational  constraints  in  the  first  place:  all  constraints  are  equational.  Consequently 
the  resulting  SEI  contciins  only  equations,  and  classical  unification  produces  the  most  general 
unifier  rapidly  for  an  appropriate  representation  of  type  expressions  (namely  term  graphs)  and 
substitutions  ("downward  closed"  equivalences  on  term  graphs).  This  establishes  the  connection 
of  type  inference  with  unification  (e.  g.,  see  [90]).  More  specifically,  it  is  easy  to  see  that  for  an 
expression  e  of  size  n  we  can  generate  in  linear  or  almost-linear  time  on  a  RAM  (depending  on 
the  encoding  of  variables)  a  set  E  of  monotype  equations  of  size  0[n)  such  that  e  is  typable  if 
and  only  if  E  is  unifiable.  E  can  be  checked  for  unifiability  in  linear  [89,  72]  or  almost-linear 
time  [43].  This  leads  to  a  linear  or  almost-linear  upper  bound  for  the  time  complexity  of  deciding 
typability  in  the  Hindley  Calculus.  Since  the  additional  inequational  constraints  in  the  Milner 
Calculus  seem  rather  innocuous  at  first  sight,  this  may  have  led  researchers  to  incorrectly  claim 
linear  or  quadratic  bounds  on  type  inference  for  the  whole  Milner  Calculus  [65,  81]. 

Theorem  8   Let  X  =  CH,  DM,  MM,  or  FMM.  Typability  in  X  is  polynomial-time  reducible  to 
semi-unifiability. 

Proof: 

Note  that  constructing  SEI{e)  for  X  can  easily  be  done  in  polynomial-time.  By  the 
three  previous  theorems  SEI{e)  is  solvable  if  and  only  if  e  is  typable  in  X. 

Corollary  5   Semi-unifiability  is  PSPACE-hard  (for  polynomial-time  reductions). 

Proof: 

Kanellakis  and  Mitchell  show  that  the  Milner-Calculus  is  PSPACE-hard  [53].   The 
result  follows  by  theorem  8. 
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4.2      Reduction  of  Semi-Unification  to  the  Flat   Mycroft 
Calculus 

Semi-unification  is  a  problem  without  nesting,  scoping,  context  conditions  on  inequalities,  or 
quantification  of  (type)  variables.  These  play  an  eminent  role  in  the  definition  of  typability  in 
the  Mycroft  Calculus.  Nonetheless,  as  we  have  seen,  the  Mycroft  Calculus  can  be  efficiently 
reduced  to  semi-unification.  The  Flat  Mycroft  Calculus  is  a  typing  discipline  without  nesting  of 
polymorphically  typed  language  constructs.  Since  semi-unification  is  a  basically  "flat"  problem 
—  scoping  and  nesting  do  not  enter  into  its  definition  —  it  should  not  come  as  a  surprise 
that  semi-unification  cannot  only  be  reduced  to  the  Mycroft  Calculus,  but  in  fact  to  the  Flat 
Mycroft  Calculus.  Reductions  from  unification-like  problems  to  typing  problems  have  grown  in 
importance  since  they  allow  us  to  prove  lower  bounds  on  the  combinatorially  simpler  unification- 
like problems  and  then  extend  them  to  their  "corresponding"  typing  problems.  Kanellakis  and 
Mitchell  proved  a  combinatorial  problem  they  called  polymorphic  unification  to  be  PSPACE- 
hard  and  extended  this  lower  bound  via  a  polynomial- time  reduction  to  the  Milner  Calculus  [53]. 
We  provide  log-space  reductions  from  unifiability  to  typability  in  the  Hindley  Calculus,  and  from 
semi-unifiability  to  typability  in  the  Flat  Mycroft  Calculus.  This  shows  that 

1.  typability  in  the  Hindley  Calculus  is  P-complete, 

2.  the  Milner  Calculus  can  be  reduced  to  the  Flat  Mycroft  Calculus  thus  extending  the 
PSPACE-hardness  result  for  the  Milner  Calculus  to  the  Flat  Mycroft  Calculus  and  thereby 
answering  a  question  reused  by  Kanellakis,  and 

3.  the  Mycroft  Calculus,  the  Flat  Mycroft  Calculus,  and  semi-unification  are  polynomial-time 
equivalent. 

4.2.1      Simplification  of  Systems  of  Equations  and  Inequalities 

So  far  we  have  used  the  term  "semi-unification"  as  if  it  was  a  single  problem  while  actually  it  is 
parameterized  by  the  ranked  alphabet  over  which  terms  range.  In  this  subsection  we  show  that 
our  usage  is  justified  in  that  every  SEI  over  any  alphabet  can  be  reduced  to  an  equivalent  (see 
below)  SEI  over  the  alphabet  A2  =  ({/},  {/  >-^  2})  that  contains  only  a  single  binary  functor.  In 
chapter  5  we  shall  see  that  this  is  the  "minimal"  possible  alphabet,  since  no  nonlinear  alphabet 
can  encode  enough  information  to  admit  the  same  kind  of  reduction.  To  make  these  reductions 
effective  and  efficient  we  shall  cissume  that  infinite  ranked  alphabets  have  functors  encoded  by 
the  binary  numerals. 

Definition  9   (Equivalent  syatemt  of  equations  and  inequalities) 

Let  S  and  S'  be  SEI's,  possibly  over  different  alphabets.  S  and  S'  are  equivalent  if 

1.  S  is  semi-unifiable  if  and  only  if  S'  it  semi-unifiable,  and 

2.  S  is  uniformly  semi-unifiable  if  and  only  if  S'  is  uniformly  semi-unifiable,  and 

3.  S  is  unifiable  if  and  only  if  S'  is  unifiable. 

Replacement  of  Functors  by  Constants 

Let  An  be  an  alphabet  that  has  exactly  one  functor  for  any  given  arity  Jfe  >  0.  We  shall  always 
write  [Ml, . . . ,  Mk]  for  the  term  built  up  from  Mi, . . . ,  M*  the  unique  ilt-ary  functor  in  A^.  We 
shall  address  the  collection  of  all  these  constructors  as  the  list  functor  since  we  may  view  [. . .] 
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as  a  single  functor  that  has  no  arity  requirements.  W.l.o.g.,  we  may  always  assume  that  An  is 
disjoint^  from  any  other  alphabet  we  may  consider.  Let  A  be  any  ranked  alphabet,  and  let  Ae 
be  the  alphabet  that  consists  of  Aq  and  all  the  functors  from  A,  but  such  that  every  functor 
from  A  has  its  arity  changed  to  0.  Define  the  transformation  function  fig  '■  T{A,  V)  — »  T{A-c,  V), 

Hc{x)  -     X,  if  z  e  V 

Hc{f{Mi,...,Mn))     -     [/,[/ic(Afi),...,M£(-'^^n)]],  otherwise 

The  transformation  Hc  is  obviously  well-defined  and  can  be  extended  to  SEI's.  We  have  the 
following  lemma. 

Lemma  6  For  all  S  £  r{A,  V),  S  and  /Xe(5)  are  equivalent. 

The  translation  of /s(/2(*i))  ^3)  via  Hc  returns  [/s.  [[/21  [^i]])  ^cj]]-  It  is  easy  to  see  that  /x^ 
can  be  implemented  by  a  one-way  finite  state  transducer  (IFSM-reduction). 

Elimination  of  Constants 

Since  we  have  assumed  that  the  constants  in  Ac  are  encoded  over  the  binary  (unranked)  alphabet 
{0, 1},  we  can  represent  any  constant  by  a  list  over  0  and  1. 

Let  A:  be  as  above,  and  let  Aci  =  ({0, 1},  {0  >-♦  0, 1  i->  0})  U  .4g,  and  define 

^lOl{x)  =     X,  xeV 

Moi(/)  =     [61,...,  6k],  /isencodedby6i...fck  6{0,  !}• 

Moi([Mi,...,A/„])     =     [/xj(Mi),...,M2(Af„)] 

Again,  /Zqi  can  be  canonically  extended  to  SEI's  over  A^.  The  correctness  of  this  transfor- 
mation is  guaranteed  by  the  next  lemma. 

Lemma  7  For  all  SEI's  S  £  T{Ac,  V),  S  and  moi(5)  are  equivalent. 

The  encoding  of  [/5,[[/2,[ii]],  22]]  via/ioi  is  [[1,0, 1],  [[[1,  0],  [zi]],  22]]-  Again,  this  translation 
can  be  implemented  by  a  one-way  finite  state  machine. 

Elimination  of  List  Constructor 

So  far  all  reductions  were  IFSM-reductions.  In  [37]  we  presented  a  iFSM-reduction  of  Aqi  into 
the  set  of  pure  (let-  and  fix-free)  A-expressions  that  translated  unifiability  of  SEI's  into  CH- 
typability.  Of  course,  this  reduction  is  very  sensitive  to  the  particular  representation  of  terms, 
SEI's  and  A-expressions,  but  it  is  interesting  to  note  that  this  translation  is  a  purely  'lexical" 
process  with  respect  to  the  standard  "string"  representation  of  terms  and  A-expressions  that  we 
have  assumed  throughout.  Roughly,  the  only  place  where  "parsing"  is  necessary  in  the  following 
steps  is  in  eliminating  the  list  constructor.  Even  this  parsing  is  "harmless"  in  that  it  can  be 
performed  by  a  log-space  bounded  transformation.'  The  transformation  below  is  inspired  —  in 
fact  a  direct  transliteration  —  of  the  encoding  into  the  A-calculus  that  we  used  in  [37]. 

Let  Aoi  be  as  above.  Recall  that  Aj  is  the  alphabet  with  only  one  functor,  which  is  binary. 
Let  xq  be  a  variable  in  V,  and  let  »  :  V  — ►  V  be  an  injective  map  whose  range  does  not  contain 
xq.  Define  /ij  ^  follows. 


One  can  thiiOc  of  ranked  alphabets  as  seti  whose  elements  carry  an  attribute.  In  this  sense  we  will  often  treat 
ranked  alphabets  simply  as  sets. 

'Context-free  languages  are  in  general  not  known  to  be  contained  in  DLOG  [2]. 
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/X2(0)  =  /(a:o,/(zo,zo)) 

/X2(l)  =  /(zo,/(2o,/(zo,a:o))) 

|i2(D)  =  /(^o.zo) 

/X2([A/i,...,A/„])  =  f{fif^2iMi),N),xo) 

i(fi2{[M2,---,Mn])  =  f{N,zo) 

The  different  encodings  of  0, 1  and  Q  indicate  why  this  reduction  works:  functor  "clashes" 
(failure  due  to  different  functors)  in  unification  or  semi-unification  instances  are  encoded  by  so- 
called  "occurs"  checks  (see  chapter  6).  No  two  of  the  encodings  of  0, 1,  Q  can  unify  or  semi-unify. 
The  xo  in  the  first  argument  position  of  all  encodings  requires  that  any  quotient  substitution 
map  Xo  to  Xo,  and  the  second  argument  position  would  force  an  instantiation  of  «o  were  it  to 
succeed,  which  is  impossible  (this  is  akin  to  the  "adjoining"  trick  in  the  reduction  of  typability  to 
semi-unification)/  Lists  of  one  length  can  never  be  unified  or  semi-unified  with  lists  of  another 
length  (or  0  and  1)  for  essentially  the  same  reason,  only  this  time  the  zq  that  forces  quotient 
substitutions  to  map  xq  to  xo  is  in  the  second  argument  position  (for  no  particular  reason  but  to 
maintain  the  analogy  to  the  above-mentioned  translation  into  the  A-calculus).  Since  Xq  occurs 
also  —  deeply  nested  —  in  the  first  argument  of  the  encodings  of  lists,  semi-unification  of  lists  of 
(fjjQ^erent  length  could  only  succeed  if  a  quotient  substitution  maps  a  nonvariable  term  containmg 
xq  to  Xo,  which  is  manifestly  impossible,  or  if  it  maps  xo  to  a  nonvariable  term  containing  xo, 
which  is  also  impossible  since  zq  is  "fixed"  in  the  second  argument.  These  considerations  lead 
to  the  following  lemma. 

Lemma  8   For  all  SEI's  S  £  T{Aoi,  V),  S  and  ^2(5)  are  equivalent. 

For  lemmas  6,  7,  and  8  we  may  assume,  by  proposition  21  in  chapter  3,  that  any  given  SET 
S  is  in  "normal  form";  that  is,  it  has  only  one  equation  and  one  inequality  per  inequality  group. 
The  proofs  then  proceed  by  induction  on  the  number  of  inequalities  and  within  each  inequality 
by  structural  induction  on  terms. 

Theorem  9  Semi-unifiability,  uniform  semi-unifiability,  and  unifiability  over  any  ranked  al- 
phabet A  are  log-space  reducible  to  semi-unifiability,  uniform  semi-unifiability,  and  unifiability, 
respectively,  over  alphabet  Ai- 

Proof: 

By  lemmas  6,  7,  and  8,  and  the  definition  of  equivalence  of  SEI's. 

Henceforth  we  shall  assume  that,  w.l.o.g.,  our  ranked  alphabet  over  which  terms  are  formed 
is  A2,  the  minimal  nonlinear  alphabet  that  contains  only  one  functor,  /,  which  is  binary. 

4.2.2     Reduction  of  Term  Equations  to  the  Hindley  Calculus 

Terms  over  A2  (or  even  Aq)  can  be  encoded  in  the  familiar  way  in  which  lists  are  usually 
represented  in  the  pure  A-calculus.  We  will  show  that  this  encoding  is  in  fact  a  log-space  reduction 
of  unifiability  to  the  typability  problem  in  the  Hindley  Calculus  (with  only  pure  A-expressions), 
which  we  will  also  call  the  simple  typability  problem. 


'Of  course,  this  argument  remains  valid  if  we  substitute  any  term  whatsoever  for  xo   (but  the  same  one  for 
every  occurrence  of  tq)* 
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A-representation  of  Terms 

We  shall  assume,  w.l.o.g.,  that  the  set  V  of  variables  we  have  used  in  terms  is  identical  to  the  set, 
also  denoted  by  V ,  of  variables  that  can  occur  in  A-expressions.  Let  also  xq  be  a  distinguished 
element  of  V,  and  »  :  V  — >  V  an  injective  mapping  whose  range  does  not  contain  zq-  Define  the 
mapping  fix  :  T{A'2,  V)  — »  A  as  follows. 


Ma(2)  =     »(*),  if  2  e  V 

tix{f{MuM2))     =     Azo  a:oMA(A/i)MA(A/2) 


We  shall  abbreviate  Azo-*oeie2  to  [61,62].  Generally,  the  expression  Azo-3!o6i  •••6*  will  be 
written  [ei,...,ej].  Instead  of^;^(A/)  we  may  also  write  M.  We  let  Zq  be  a  "reserved"  variable 
that  cannot  occur  in  any  term,  whence  we  may  assume  that  i  is  the  identity  function. 

The  map  fix  not  only  gives  us  an  encoding  of  terms  as  A-expressions,  but  also  in  the  form 
of  the  types  of  these  A-expressions  in  CH,  DM,  MM,  and  FMM:  there  is  no  difference  as  to 
which  typing  system  we  choose  since  the  encodings  are  only  pure  A-expressions,  and  for  pure  A- 
expressions  the  typing  rules  in  our  type  calculi  are  identical.  To  encode  a  term  equation  M  =  N 
as  a  pure  A-expression  all  we  have  to  do  now  is  to  "force"  the  types  of  the  M  and  N  to  be  equal. 
This  is  easily  achieved  by  applying  A-bound  variable,  g,  to  both  M  and  N.  Since  we  can  assume 
that  a  nontrivial  unifiability  problem  instance  consists  of  only  one  equation  (see  proposition  21 
in  chapter  3)  we  can  thus  extend  fix  to  a  map  fix  ■  ^(-^2,  V')  — ►  A  by 

fix{M  =  N)     =     Xg.[gM,gN]v/heieg  ^  FV(M)uFV{N)u{xo}. 

For  convenience'  sake  (and  by  abuse  of  notation)  we  will  simply  write  M  =  N  ht  Xg.[gM,gN] 
(which  is  already  an  abbreviation). 

Correctness 

It  is  easy  to  see  that  fix  can  be  computed  in  logarithmic  space.  To  complete  the  reduction  from 
unifiability  to  simple  typability,  it  remains  to  be  shown  that  fix  is  indeed  a  problem  reduction; 
more  precisely,  we  will  show  that  for  all  M,N  e  T(^2,  ^),  it  holds  that  the  SEI  {M  =  N)  is 
solvable  if  and  only  if  there  is  a  typing  for  the  A-expression  Xx.M  =  N  derivable  in  the  Hindley 
Calculus  where  x=  FV (M)  Li  FV{N)  Again,  as  in  the  first  half  of  this  chapter,  any  sequence  £ 
may  also  be  viewed  as  a  set. 

There  are  many  possible  proofs  of  correctness.  For  example,  we  can  try  to  show  that  the 
principal  types  of  M  and  N  are  unifiable  if  and  only  if  M,  N  are  unifiable.  This  is  quite 
apparently  true,  but  it  is  technically  rather  messy  to  prove  since  there  are  in  general  many  more 
type  variables  in  the  principal  types  of  M  and  N  than  there  are  variables  in  M  and  N.  For  this 
reason  we  take  an  approach  in  which  we  get  rid  of  these  extra  type  variables  by  "normalizing" 
principal  types. 

As  a  proviso  to  the  following  discussion  let  us  note  that  Xx.[M  =  N)  is  a  closed  A-expression, 
and  it  is  simply  typable  if  and  only  if  {z  :  r}  13  M  =  TV  :  r'  is  derivable  where  f  is  a  sequence 
of  monotypes  and  r'  is  also  a  monotype.*  For  this  reason  we  shall  only  work  with  monotype 
environments  A  here;  that  is,  A(x)  £  M  for  aU  z  G  dom  A. 

Unifiability  Implies  Typability  First  we  show  that  if  a  pair  of  terms  is  unifiable  then  the 
A-representation  of  this  unifiability  instance  is  simply  typable. 

Define  the  canonical  type  mapping  r  that  maps  type  environments  and  terms  to  monotypes 
as  follows. 


'The  notation  {£ :  7*}  is  an  obvious  short-hand. 
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t{A,x)  =     A{x),xeV 

r(.4,[Afi,...,A/t])     =     (r(.4,A/i)-...-r(A,AfO--ao)-ao 

Here  ao  denotes  a  fixed  type  variable.  Once  more,  we  abbreviate  (rj  —»...—>  t),  — ►  qq)  — »  Qo 
to  [ri, . . . ,  Tn].  The  following  proposition  is  easy  to  prove  by  structural  induction  over  terms. 

Proposition  9  Let  A,  A'  be  type  environments,  and  M,Ni,...,Nk   terms  whose  variables  are 
contained  in  the  domain  of  A. 

1.  t{A,  M)  is  well-defined  and  unique. 

2.  If  A  is  injective  then  t  is  injective  with  respect  to  its  second  argument;  i.e.,  t{A,Ni)  — 
t{A,N2)  implies  iVj  =  i\r2- 

3.  The  typing  A  D  M  :  t{A,  M)  is  derivable. 

4.  If  {xi, . .  .  ,Xn}  is  the  domain  of  A  then  A  =  {xi  :  t{A,xi),  . .  . ,  Zn  :  t(-4,  x„)} 

Given  a  substitution  <t  :  V  -^  T{A2,  V)  on  terms  (not  type  expressions)  we  define  (t{A),  the 
application  of  cr  to  a  type  environment  A  =  {zj  :  ti,  . . . ,  Xn  :  Tn},  a.s  follows. 

<7(.4)  =  {xi  :  T{A,a{xi)),...,Xr,  :  r(y4,  (t(z„))}. 

Note  that  according  to  proposition  9,  part  4,  l(A)  —  A  for  all  A  where  t  denotes  the  identity 
substitution. 

Lemma  10   For  all  terms  M ,    type  environments  A,   and  term  substitutions  a,    1/  dom  A   D 
FV{M)iJFV{N),  then  T{A,cr{M))  =  T{a{A),M). 

Proof: 

We  prove  this  lemma  by  structural  induction  on  M.^ 

•  (Base  case)  If  M  is  a  variable,  Xi,  then 

t(<t{A),M)     =     T(a{A),Xi) 

—     r(/l,o'(zj))  (by  definition  of  cr(^)) 
=     r(A,<T(Af)) 

•  (Inductive  case)  If  M  —  [iVi, . . . ,  iV^]  for  some  terms  N\, . . . ,  Nk,  then 

T{<r{A),M)     =  T{a{A),[N,,...,Nk]) 

=  [r{<T{A),N,),...,ri<T{A),N^)] 

=  [T{A,a{N,)),...,T{A,a{N^))]{ind.hyp.) 

=  T(A,[a{N,),...,<T{N,)]) 

=  t{A,<t{[N,,...,N,])) 

=  riA,<T{M)) 


'It  is  actually  more  like  a  "proof  by  notation". 
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This  completes  the  proof. 

Lemma  11   For  all  M,N  G  T{A2,V),  if  M  and  N  are  umfiable  then  X£.{M  =  N)   is  simply 
typable  (lypable  in  the  Hindley  Calculus). 

Proof:  By  assumption  of  the  lemma  there  is  a  unifier  v  of  Af,  A'';  i.e.,  v{M)  =  v{N). 
Let  .A  be  a  type  environment  whose  domain  contains  sufficiently  many  variables 
(that  is,  at  least  aU  variables  in  FV(M)  U  FV{N)).  By  proposition  9,  part  3,  both 
v{A)  D  p(M)  :  t{v{A),M)  and  v{A)  D  p{N)  :  t{v{A),N)  are  derivable  typings. 
According  to  lemma  10  and  by  the  fact  that  v  is  a  unifier  we  have  t(v{A),  M)  = 
t{A,v{M))  —  t{A,v{N))  -  t{v{A),N).  Call  this  type  t'.  Consequently,  for  any 
a'  e  TV, 

A'{g:T'  -^a'}^[gM,gN]  -.[a',  a'] 

and 

A'  D  (M  =  iV)  :  7-'  -.  [a, a'] 

are  derivable  typings,  the  latter  of  which  shows  that  \x.{M  =  N)  is  simply  typable. 

Typability  Implies  Unifiability     We  now  proceed  to  prove  that  if  Xx.{M  =  N),  for  given 
terms  M  and  N,  is  typable  then  M  and  N  are  unifiable. 

Some  preliminary  results  on  the  normalization  of  typings  are  helpful  in  facilitating  a  transla- 
tion of  types  to  terms  and  from  typings  to  substitutions.  The  normalization  function  v  on  types 
is  defined  as  follows. 

i/(r)  —  a,  if  r  =  a  and  a  G  TV 

u{t)  =  Mn),  . .  . ,  «/(r„)],  if  r  =  (n  -  . . .  ^  r„  -  /)  -.  r' 

for  some  ri , . .  . ,  r„,  r' 
^(t)  =  Qo,  otherwise 

Proposition  12        /.  i/  is  well-defined  and  unique. 

2.  For  any  set  of  type  expressions  ri,...,rn   there  is  an  injective  type  environment  A  and 
terms  Ni,. . . ,  Ni,  such  that  ^(t,)  =  t[A,  Ni)  for  all  i  such  that  I  <  i  <  k. 

The  mapping  i/  can  be  extended  to  type  environments  in  the  standard  way:  i^{A)  =  {xi  : 
t^(n),--,Xn  ■■  H'^n)}  i{  A  =  {xi  ■.Ti,...,Xn:  r„}. 

Lemma  13   For  any  derivable  typing  A  D  M  :  t,  the  typing  u{A)  D  M  :  i'(r)  is  also  derivable, 
ondi/(r)  =  r(i/(A),M). 

Proof:     This  can  be  shown  by  simple  induction  on  the  structure  of  M. 

Lemma  14  For  all  M,N  E.  T(A2,V),  if  Xx.{M  =  N)  is  simply  typable  then  M  and  N  are 
unifiable. 

Proof:  By  assumption,  there  is  a  derivable  typing  A  D  (M  =  N)  :  t.  By  the 
definition  of  nx  this  expsinds  to 

A  D  \g.Xzo.ixo{9M){gN))  :  r. 
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Since  the  typing  rules  of  the  Hindley  Calculus  are  syntax-directed,  we  can  conclude, 
by  "backwards  reasoning",  that  there  are  type  expressions  t',T2,T3  such  that  r  = 
(r'  -►  Tj)  -►  (t2  -»  T^  -♦  ra)  ->  T3  and,  with  A'  =  A{g  :  r'  ->  T2,  f  ■■  T2  —>  r2  ->  T3}, 
both 

A'  D  M:t' 

and 

A'  D  N  :t' 
are  derivable.  Let  us  define  A"  =  v(A')  and  t"  =  :'(r').   By  lemma  13,  the  typings 

A"  DM:  t" 
and 

A"dN:  t" 

are  both  derivable.  If  A"  =  {xi  :  r[',...,Xk  :  rj^'},  proposition  12,  part  2,  im- 
plies that  there  are  terms  M,Ni,. .  . ,  Nk  and  an  injective  type  environment  Ao 
such  that  t{Ao,M)  =  t",t(Ao,Ni)  =  r^',. . .  ,T{Ao.Nk)  =  rj,'.  If  we  define 
<r  =  {xi  — .  Ni, . . .  ,Xk  — »  -^V^jk},  the  previous  two  typings  can  be  rephrased  as 

<t{Ao)  D  M  ■.T{a{Ao),M) 

and 

{zi  :t(^o,<t(zi)),...,z*  ■.T{Ao,<r{xk))}  D  M  :t{A",M) 

Also  by  lemma  13  we  can  conclude  t[(t{Ao),  M)  —  t[A",  M)  —  T[a[Ao),  N).  Finally, 
this  yields  T{Ao,a{M))  =  T{Ao,a-(N))  by  lemma  10  and,  since  Aq  is  injective,  by 
proposition  9,  part  2,  a-(M)  =  o"(^).  Consequently,  M  and  N  are  unifiable. 

Theorem  10  For  all  M,  N  £  T{A2,  V),  M  and  N  are  unifiable  if  and  only  if  Xx.{M  =  N)  is 
simply  typable. 

Proof:     Lemma  11  shows  one  direction,  lemma  14  the  other. 

Corollary  15   Simple  typability  (typability  in  the  Hindley  Calculus)  is  P-complete  under  log- 
space  reductions. 

Proof: 

Since  simple  typability  is  log-space  reducible  to  unification,  it  is  in  P.  By  theorem  10 
the  result  follows  from  the  fact  that  unification  is  P-complete  [25]. 
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4.2.3      Reduction  of  Uniform  Semi-Unification 

To  gain  an  intuition  into  the  more  complicated  reduction  of  general  (nonuniform)  semi-unification 
problem  instances  to  the  Flat  Mycroft  Calculus  we  shall  consider  a  special  case  here  that  yields 
an  interesting  characterization  of  uniform  semi-unification  in  terms  of  a  restricted  version  of  the 
Flat  Mycroft  Calculus. 

Let  IFMM  ("Flat  Milner-Mycroft  Ceilculus  with  at  most  one  occurrence  of  the  fix-bound 
variable")  denote  MM  restricted  to  expressions  of  the  form  fixy.e  where  e  is  a  pure  A-expression 
containing  at  most  one  free  occurrence  of  y.  By  extending  the  A-encodings  of  terms  we  can 
also  encode  inequalities  between  terms.  For  this  we  need  the  polymorphic  typing  rule  (FIX- 
P),  though,  and  consequently  we  shall  £issume  the  (Flat)  Mycroft  Calculus  when  we  talk  about 
typability  in  the  rest  of  this  chapter. 

The  consistent  labeling  formulation  for  MM  already  gives  an  indication  of  how  term  inequal- 
ities can  be  captured  in  the  constraints  associated  with  a  fix-binding.  Note  that  the  type  for  y 
in  fixy.e  in  some  sense  "comes  from"  the  type  of  e  since  they  have  to  be  equal.  Now  if  we  can 
"force"  c  to  have  the  type  of  the  A-encoding  M  of  M  and  if  we  can  "hide"  (in  the  sense  that 
it  does  not  affect  the  type  of  e)  somewhere  in  e  the  A-encoding  y  =  N,  then  the  y  in  y  =  N 
is  bound  to  have  the  same  type  as  N,  but  by  the  typing  rules  for  fix  the  occurrence  of  y  must 
also  be  a  substitution  instance  of  the  type  of  e.  In  other  words,  we  will  have  encoded  the  single 
term  inequality  M  <  A''  as  an  instance  of  the  IFMM  typability  problem.  Since  M  and  N,  and 
consequently  M  and  iV  contain  in  general  a  lot  of  free  variables  we  have  to  be  a  little  bit  more 
careful  than  this.  To  make  sure  that  different  occurrences  of  a  free  variable  x,  say,  in  M  have 
the  same  type  everywhere  (which  corresponds  to  a  semi-unifier  uniformly  applying  the  same 
substitution  to  all  occurrences  of  a  variable),  the  variables  in  M  and  N  have  to  be  A-bound 
some  place,  as  was  the  case  for  encodings  of  equations  (for  the  same  reason,  by  the  way).  The 
A-bindings  for  these  va  iables  cannot  go  outside  of  the  whole  expression,  as  in  Az.fixy.e,  since 
—  now  we  are  in  MM  —  this  would  mean  that  the  fix-binding  is  in  the  scope  of  the  A-bindings, 
and  essentially  no  type  variable  in  the  type  of  e  could  be  instantiated.  Consequently  the  place 
where  the  A-bindings  have  to  go  is  just  after  the  fix-binding:  fixy.Aa.e.  This  in  turn  complicates 
the  encoding  of  the  equation  y  =  N  above,  but  fortunately  everything  works  out. 

Theorem  11    Uniform  semi-unifiability  and  IFMAf -typability  are  log-space  equivalent. 

Proof: 

An  inspection  of  the  reduction  of  MM-typability  to  semi-unification  shows  that  the 
instances  of  IFMM  are  reduced  to  instances  of  uniform  semi-unifiability.  Conversely, 
consider  a  single  inequality  M  <  N  and  the  A-expression 

fixy.Az.ii:Alr(Ai*.(y2)  =  N), 

which  is  clearly  an  instance  of  IFMM-typability.  Here  x  again  is  the  sequence  of  all 
free  variables  in  M  and  N  in  any  order,  and  i"  is  a  sequence  of  variables  with  the 
same  length  as  x  ,  but  completely  disjoint  from  it.  K  denotes  the  term  Xx.Xy.x. 
Since  M  and  N  can  be  computed  in  logarithmic  space,  this  expression  can  clearly 
be  computed  in  logarithmic  space  from  M  <  N.  The  correctness  of  this  reduction 
automatically  falls  out  the  general  case  of  reducing  nonuniform  semi-unification  to 
FMM-typability,  which  is  shown  towards  the  end  of  this  chapter. 

Corollary  18   IFMM-typability  is  F-complete  under  log-space  reductions. 
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Proof: 

Kapur  et  al.  [54]  give  a  complicated  algorithm  for  deciding  uniform  semi-unifiability 
in  polynomial  time  (see  chapter  6  for  more  information  on  their  algorithm).  Since 
unification  is  a  subcase  of  uniform  semi-unification  this  implies  the  theorem. 

This  corollary  shows  that,  theoretically,  uniform  semi-unification  is  no  harder  than  unifica- 
tion, although,  in  practice  there  is  a  big  difference:  The  polynomial-time  algorithm  in  [54]  is 
very  complicated  and  executes  in  a  polynomial  of  some  higher  degree  whereas  unification  has  a 
theoretically  and  practically  very  fast  algorithm,  namely  the  equivalence  class  merging  algorithm 
with  delayed  occurs  checking  and  the  union/find  data  structure  (see,  e.g.,  [l,  section  6.7])  which 
seems  to  be  continuously  rediscovered  (see,  e.g.,  [107]).  A  theoretically  faster,  but  less  practical 
algorithm  is  the  linear  time  decision  algorithm  of  [89]  and  [72]. 

4.2.4      Reduction  of  Semi-Unification 

We  have  seen  how  a  single  inequality  can  be  encoded  in  the  Flat  Mycroft  Calculus,  even  under 
the  restriction  that  a  fix-bound  variabe  may  only  occur  once.  Intuitively,  it  is  clear  how  to 
proceed  from  here  to  encode  a  whole  system  of  equations  and  inequalities: 

1.  Encode  every  inequality  individually  as  a  recursive  definition  and  view  the  collection  of  all 
such  recursive  definitions  as  a  single  mutually  recursive  definition, 

2.  encode  the  mutually  recursive  definition  as  a  single  recursive  definition  in  a  "standard" 
way, 

3.  and  along  the  way  be  careful  about  A-binding  the  fiee  variables  in  the  given  SEI  and  do 
not  forget  to  add  encodings  for  the  equations. 

The  following  technical  proposition  is  used  later  in  the  proof  of  correctness  of  the  reduction 
outlined  above.  We  make  use  of  another  abbreviation:  For  fixed  fc  >  0,  »  =  Azi  .  ..zn.Zj  is  the 
t-th  projection  function  for  1  <  »  <  ^. 

Proposition  17  Let  X  =  CH,  DM,  MM,  FMM.  Let  f  =  [n, . . .  ,-n,]. 

Xh  ADe:f<^ 

X\-  Ajei:Ti,ie  {l,...,fc} 

X  b  AD  [ei,...,ejk]  :  f  O 
Xh  ADei:Ti,ie  {1, ...,*} 

(3r')  A  D  ei  =  62  :  r*  O 
(3r")ADei  -.r"  and  A  Dei  :  r" 

Theorem  12   Semi-unification  is  log-space  reducible  to  typability  in  the  Flat  Mycroft  Calculus. 

Proof: 

Without  loss  of  generality  we  may  assume  that  A  =  A2.  As  noted  in  chapter  2  it  is 
sufficient  to  show  that  any  SEI  5  =  [Mq  -  Nq,  Mi  <  Ni, . . . ,  M/,  <  Nk)  is  reducible 
to  FMM.  Let  z  =  Zi  . . .  «„  where  «i, . . . ,  z„  are  all  the  distinct  variables  occurring 
in  S;  let  £=  zi  . . .  Zm  be  m  distinct  variables  not  occurring  in  5. 
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Now  consider 

A(5)     =     ixxy.\x.K[Mi,...,M^], 

[Mo  =  No,  Xz.{yzl  =  N,),...,  Xz.{yzk  =  N^)] 

where  K  =  Xx.Xy.x,  as  usual.  X[S)  can  clearly  be  constructed  in  logarithmic  space. 
We  will  show  that  5  has  a  solution  if  and  only  if  A(5)  is  typable  in  the  Flat  Mycroft 
Calculus. 

Lemma  18    There  is  a  type  r  »uch  that 

FMM\-{}     D     iiicy.Xx.K[Mi,...,Mk], 

[Mo  =  No,  Xz.{yzi  =  N,),...,  Xz.{yzk  =  N^)]  :  r 

if  and  only  if  there  are  monotypes  f  —  Ti  . . .  tj,,  tm^,  tm,  i  •  •  •  i  ^Mi,,  TVVoi  '''^i  i  ■  •  ■  i  '^JV*  such  that 

{£ :  f}  D  Mi  :  TMi 
{x  :  r}  D  Ni  :  TN, 

■'"Wo   =  ■'■^0 


Proof: 

{}     D     fixy.Xx.K[M„...,Mu], 

[Mo  -  No,  Xz.{yzi  =  iVi), . . . ,  Xz.{yzk  =  Nk)]  :  t 

is  derivable  for  some  r  if  and  only  if  there  is  a  r^  with  type  variables  a  =  ai  .  .  .a„ 
such  that  r  is  a  substitution  instance  of  r^  and 

{y  :  Va.Ty}  D  Xx.[...]l  :  r^ 

is  derivable  in  FMM.  This,  in  turn,  is  derivable  if  and  only  if  there  are  r  =  ri  — i 
. . .  — »  Tfc  and  Tm  such  that  Ty  =  f  —*  tm  and 

{y  :  '^d.f->  TM,x:  f}  D  [...]!  :  tm 

are  derivable.  According  to  proposition  17,  this  is  the  case  if  and  only  if 

{y  :  "^d.f  -*  TM,£  :  r}  D  [Mi, . . ._,  M,,]  :  TMand 

{y  :  Va.f -»  tm,£:t}D  [Xz.{yzi  ^  Ni) Xz.{yzk  =  Nk)]  :  r= 

for  some  type  t-.  Again,  by  proposition  17,  this  holds  if  and  only  if  tm  —  (r^f,  — < 
•  •  ■  —  T-jvf^  —  To)  —  To  and 

{y  :  "id.f-^  TM,x:  ¥)  ^  Mi  :  TM.,i  6  {l,...,fc}, 

{y  :  yd.f  -*  TM,x  :  f,z:  r(')}  D  y5»  :  r^v^.i  £  {l,--  .,*},and 

{y  :  Vq.t  -*  Tm,  *  :  f,  2  :  r(')}  D  Ni  :  r//. 
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for  some  types  tm,  , .  • . ,  tm„  ,rs,,..  .,ts^  and  suitable  types  r(')  =  t^'  . . .  r^^  Note 
that,  w.l.o.g., 

{y.^d.f-*  TM,£:f,z:  r(')}  D  yzi:  Tf^^.i  £  {!,..., k} 

holds  if  and  only  if 

{y  :  "^a.f  -^  tm,£  :  f,  z  :  t(')}  j  y  :  rt")  — »  (ai  — ►  . . .  q/,  — >  Qi)  — ►  tn^ 

Since  the  type  of  any  occurrence  of  y  must  be  a  substitution  instance  of  the  type  of 
y  in  the  type  assumption,  it  follows  that  Tf/.  must  be  a  type  instance  of  tm^.  It  is 
easy  to  check  that  this  is  also  sufficient.  Since  neither  Mi  nor  Nt  contain  y  or  any  of 
the  z's,  for  any  i,  we  can  summarize  that  a  necessary  and  sufficient  condition  for 

{}Dfixy.\£.K[Mu.--,Mk], 

[Mo  =  No,  Xz.{yzi  =:  N^), . . . ,  \z.{yzk  =  Nk)]  :  r 

to  be  FMM-typable  is  that  for  some  f  =  ri  . . .  t\  and  for  some  monotypes 
TAf 0 1  ■^M, ,  ■  •  • ,  TAf » 1  ta-j  ,  rjv, , . . . ,  Tf{^  we  have 

{f :  f}  D  Mi  :tm„0  <i<k 
{x  :  t}  J  Ni  :  rAr,,0  <i<k 
TAfo  =  Tat, 
7-Af  i  =  Tat.  ,  1  <  i  <  *i 

Proof:     (Proof  of  theorem  continued) 

With  this  lemma  it  is  sufficient  to  show  that  whenever  there  is  a  solution  of  S  then 
the  above  constraints  can  be  satisfied,  and  vice  versa. 

(=>)  Assume  there  is  a  solution  cr  of  S.  Let  .4o  be  a  type  environment  that  maps 
every  variable  in  x  into  a  distinct  type  variable.  (Any  other  type  environment 
that  is  injective  on  x  will  also  do.)  Now  define  r^;  =  T{A,a{Mi))  for  0  <  t  <  Jfe 
where  r  is  the  canonical  type  mapping  from  section  4.2.2,  and  let  t;  =  a{Ao){xi). 
We  have 

a(Ao)  D  Mi  :  T{a{Ao),  Mi) 
o{Ao)DNi:T{<r{Ao),Ni) 

By  lemma  10,  we  have  r(o-(Ao),  M.)  —  T{Ao,a-{Mi))  and  t{(t{Ao),  Ni)  = 
T{Ao,(T{Ni)).  Since  tr  is  a  semi-unifier  of  S  it  furthermore  follows  for  every 
i  that  there  is  a  p;  such  that  pi{a{Mi))  =  cr(JVi).  It  is  easy  to  show  that  the 
canonical  type  mapping  r  above  is  monotonic  (with  respect  to  term  subsump- 
tion)  in  its  second  argument. 

(■^)  Recall  the  function  u  :  M  —*  M,  which  normalizes  type  expressions.  Given 
types  as  required  such  that 

{£:  f}Z)  Mi  :tm.,0<  i<k 
{£:f}jNi:TN,,0<i<k 

TMi  =  Tf^i,l  <  i  <  k 
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by  lemma  12  we  know  that 

{x:u{t)}D  Mi  ■.u{TM.).0<i<k 
{£:i^{T)}DNi:iy(TN.),0<  »<* 

are  also  derivable.  Following  the  proof  of  lemma  14,  we  can  define  a  substitution 
a  (on  terms).  In  the  previous  step  we  saw  that  r  is  monotonic  in  its  second 
argument.  This  argument  can  be  strengthened  to  show,  for  injective  A,  Mi  < 
Mi  O  r{A,Mi)  <  t{A,M2). 

This  completes  the  proof  of  the  theorem 

Corollary  19    The  following  three  problems  are  polynomial-time  equivalent: 

1.  Typability  in  the  Mycroft  Calculus; 

2.  (nonuniform)  semi-unifiability; 

3.  typability  in  the  Flat  Mycroft  Calculus. 

Proof: 

The  steps  (1)  =>  (2)  and  (2)  =>  (3)  are  proved  in  theorems  8  and  12;  (3)  =>  (1)  is 
trivial  since  FMM-typability  instances  are  a  subclass  of  MM-typability  instances. 

This  corollary  stands  in  contradiction  to  a  statement  by  Mycroft  who  suggests  prohibiting 
nested  polymorphically  typed  fix-definitions  "due  to  the  exponential  cost  of  analysing  nested 
fix  definitions"  [85].  Indeed  nesting  does  not  make  things  any  worse  than  they  already  are  in  a 
single  fix  definition. 

Corollary  20        1.    The  Mtlner  Calculus  is  polynomial-time  reducible  to  the  Flat  Mycroft  Cal- 
culus. 

2.   The  Flat  Mycroft  Calculus  is  PSPACE-hard. 
Proof: 

1.  By  theorem  8  the  Milner  Calculus  is  polynomial-time  reducible  to  semi- 
unification,  which  in  turn  is  polynomial-time  reducible  to  the  Flat  Mycroft 
Calculus  by  theorem  12. 

2.  By  (1)  and  the  PSPACE-hardness  result  of  Kanellakis  and  Mitchell  [53]  for  the 
Milner  Calculus. 
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4.3      Type  Inference  in  B  and  Semi-Unification 

The  programming  language  B  [75]  (now  called  ABC)  has  a  polymorphic  typing  rule  for  recursive 
definitions  and  relies  on  type  inference  to  determine  the  type  correctness  of  programs.  In  contrast, 
Hope  [8]  also  has  a  polymorphic  typing  rule  for  recursively  defined  functions,  but  mandates  that 
their  types  be  explicitly  declared. 

Even  though  there  is  no  type  inference  system  that  specifies  "logically"  type  correctness  in 
B,  it  is  clear  from  algorithm  AA  in  [74]  that  the  type  computed  in  a  recursive  definition  is 
the  principal  type  in  a  "Flat"  Milnet-Mycroft  style  typing  system.  In  fact,  AA  can  be  viewed 
as  a  variant  of  Mycroft's  semi-algorithm  for  computing  principal  types  in  the  Milner-Mycroft 
Calculus  [85].  AA  is  provably  nonterminating,  and  Meertens  proceeds  to  refine  it  by  adding  a 
criterion  reminiscent  of  our  extended  occurs  check,  but  actually  of  much  broader  applicability, 
that  guarantees  termination  of  the  resulting  algorithm.  Meertens  argued  that  the  absence  of 
higher-order  functions,  nesting,  and  recursive  types  in  B  permitted  uniform  termination  of  his 
type  inference  algorithm. 

Higher-order  functions  and  their  typing  requirements  usually  create  syntactic  and  semantic 
problems  due  to  the  fact  that  they  are  nonmonotonic  in  their  domain  types.  This  is  of  significance 
in  the  Second  Order  A-calcuIus,  but  not  in  the  Milner-Mycroft  Calculus  (since  argument  types 
cannot  be  required  to  be  polymorphic).  In  the  previous  sections  we  have  seen  that  nesting  of 
definitions  does  not  greatly  change  basic  questions  of  type  inference.  This  suggests  that  the  type 
inference  problem  in  B  is  actually  no  simpler  than  type  inference  in  the  Flat  Mycroft  Calculus  and 
the  complete  Milner-Mycroft  Calculus,  in  contrast  to  Meertens'  general  considerations.  Indeed 
we  shall  show  that  semi-unification  can  be  reduced  to  type  inference  for  a  small  subset  of  B, 
which  substantiates  that  neither  higher-order  functions  nor  nested  definitions  greatly  influence 
the  type  inference  problem.  Since  B  has  a  syntactically  simpler  type  system  than  MM  the 
converse  reduction  is  immediate.  This  implies  that  Meertens'  uniformly  terminating  algorithm 
either  proves  decidability  of  semi-unification  or  it  is  not  correct.  In  fact  we  shall  show  that 
Meertens'  algorithm  errs  on  the  safe  side.  There  are  cases  where  the  algorithm  flags  type- 
incorrectness  while  in  fact  there  is  a  derivable  typing  for  it  (and  AA  would  compute  it). 

In  subsection  4.3.1  we  introduce  Pure  B  and  its  typing  system.  In  subsection  4.3.2  we  show 
that  type  inference  in  B  and  semi-unification  are  polynomial  time  equivalent.  We  also  explain 
Meertens'  termination  criterion  in  terms  of  a  criterion  for  our  semi-unification  algorithm  (without 
extended  occurs  check)  and  give  an  example,  both  as  a  semi-unification  problem  and  as  a  Pure 
B  program,  that  shows  where  that  criterion  errs. 

4.3.1      Pure  B 

Pure  B  is  only  a  minute  subset  of  B,  yet  big  enough  to  capture  the  power  of  the  polymorphic 
typing  rule  for  recursive  definitions.  Pure  B  programs  are  given  by  the  following  grammar. 

p::=      HOW'TOz  OF  z':c 
c::=      X  OF  e  \cc\  PUTeine 
e::=      *l(e,e) 

where  x  ranges  over  a  predefined  lexical  category  of  identifiers.  Even  though  the  semeintics  of  a 
language,  as  we  have  seen,  is  not  all  necessary  to  explain  B's  typing  discipline,  it  helps  to  get 
an  intuition  for  it.  HOW'TO  x  OF  x'  :  c  defines  a  procedure  x  with  formed  (variable)  parameter 
x'.  The  body  of  the  procedure  is  the  command  c.  A  command  is  either  the  application  of  a 
procedure  to  an  expression,  xe,  or  a  sequence  of  commands.  An  expression  is  either  a  variable 
or  a  pair  of  expressions.   The  PUT  command  copies  its  first  argument  to  the  second  argument 
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Let  A  range  over  type  environments;  z  over  variables;  e,e'  over  A-expressions;  t  over  type 
variables;  r,  r'  over  monotypes;  <T,<r'  over  polytypes.  The  following  are  type  inference  axiom  and 
rule  schemes  for  Pure  B. 


Name 


Axiom/rule 


(TAUTD)      A{z  :  r}  D  z  :  T 

(TAUTP)      A{x  :  Vfr  —  unit}  D  x  :  T[f/i\  —  unit 


(PUT) 

(PAIR) 

(APPL) 

(SEQ) 
(PROG) 


AD  e  :  T 
ADe'  -.T 


A  D  PUTelNe' 

AD  e  :t 
ADe'  ■.t' 


AD(e,e'):(r,r') 

A  D  X  :  T  —>  unit 
A  D  e  :t 


AD  xOF  e 

ADc 

ADc' 

A  D  cc' 

A{x  :  Va.T  — »  unit}  D  c 
A{x  :  Vq.t  — >  unit}  D  x'  :  t 
D   HOW'TOzOFz'  :  c 


Table  4.4:  Type  inference  axioms  and  rules  for  Pure  B 

by  pattern  matching.  This  is  eill  we  need  in  Pure  B  to  encode  semi-unification.  B,  of  course,  heis 
more  complicated  control  structures  and  data  types  that  make  it  usable  in  practice. 
The  type  expressions  in  Pure  B  are  defined  by 

r     ::=     1  |  (r,  r') 

a     ::=     t  — >  unit  |  Vt.o- 

Once  again,  we  shall  say  type  expressions  derivable  fiom  t  are  monotypes,  and  type  ex- 
pressions derivable  from  a  are  polytypes.  The  type  expression  r  — ►  unit  denotes  the  type  of 
procedures  whose  argument  is  of  type  r.  Note  that  procedures  are  strictly  first-order,  since  they 
can  only  take  inputs  whose  types  we  built  up  from  type  variables  and  pairing  and  that  only 
procedures  can  be  polymorphic.^"  The  typing  rules  of  Pure  B  are  given  in  Table  4.4. 

A  Pure  B  program  p  is  typable  if  D  p  is  derivable  in  the  type  inference  system  for  Pure  B. 


'"Thjj  is,  in  general,  an  inessential  restriction  since  polymorphic  "data"  such  as  "nil"  in  Pascal  can  be  treated 
as  nullary  polymorphic  functions. 
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4.3.2      Equivalence  of  Pure  B  and  Semi-Unification 

Recall  the  reduction  of  semi-unification  to  the  Flat  Milner-Mycroft  Calculus.  To  facilitate  the 
encoding  of  a  term  inequality  in  Pure  B  we  need  a  representation  of  first-order  terms  by  Pure 
B  expressions  whose  types  correspond  to  these  terms,  and  an  encoding  of  equations  between 
the  types  representing  terms  (the  PUT  command  will  do).  Term  subsumption  inequalities  can 
be  represented  by  the  polymorphic  typing  rule  for  Pure  B  procedures.  Indeed  these  are  all  the 
ingredients  we  need,  and  they  are  readily  available  in  the  type  system  for  Pure  B. 

Consider,   as  usual,  first-order  terms  over  the  ranked  alphabet  A2.     Define  the  encoding 
function  p  :  T{A2,  V)  — .  E,  where  E  denotes  the  Pure  B  expressions,  as  follows. 

p{z)     =     x,'i{xeV 
p{f{M,N))     =     (p(M),p(iV),  otherwise 

We  shall  denote  p{M)  simply  by  M.  Given  a  term  inequality  M  <  N  the  Pure  B  program 

HOW'TO  p  OF  x: 
PUT  X  in  M 
pOF  N 

where  x  and  p  are  identifiers  not  occurring  in  M  or  N,  is  typable  if  and  only  if  Af  <  TV^  is 
solvable.  More  generally,  the  program 

HOW'TO  p  OF  x: 

PUT  X  IN  (Mi,...,M*) 
PUT  A?o  IN  No 
pOF(N^,y['\...,y[''^) 

pOF(yi^),...,yi^-\iV„yi'n...,v!*') 
pOF{yi'\...,y['-'\N,) 

with  p,  X,  and  additional  Pure  B  variables  yi  ,i  ^  j  not  occurring  in  any  of  the  M,-  or  iV,-,  is 
typable  if  and  only  if  the  SEI  S  =  [Mq  =  No,  Mi  <  N^,...,  Mk  <  Nk)  is  solvable. 

Theorem  13    Typability  in  Pure  B  and  iemi-unification  are  polynomial-time  equivalent. 

Proof: 

If  we  choose  the  encodings  described  above  then  the  reductions  can  be  easily  adapted 
from  the  general  reductions  from  MM  to  semi-unification,  and  from  semi-unification 
to  FMM. 

Meertens'  non-terminating  algorithm  AA  computes  the  principal  type  for  Pure  B  in  the  sense 
that  it  computes  a  type  expression  a  for  the  procedure  p  defined  in  HOW'TO  p  OF  x:  c  such 
that  this  definition  is  typable  and  for  any  other  type  derivation  the  type  of  p,  </ ,  will  be  such 
that  a  Q  a'  in  the  generic  instance  preordering  of  chapter  2. 

Instead  of  explaining  algorithm  AA  and  the  refinement  that  results  in  a  uniformly  terminating 
algorithm  we  shall  translate  the  termination  criterion  for  AA  into  a  termination  criterion  for  our 
algorithm  A  and  explain  its  effects  in  terms  of  semi-unification.  For  this,  we  assume  the  reader 
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is  familiar  with  the  material  in  chapter  6.  The  independent  sources  of  every  arrow  graph  in 
an  execution  of  algorithm  A  are  nodes  that  are  already  present  in  the  "initial"  arrow  graph 
that  represents  a  given  SEI;  i.e.,  the  independent  sources  of  any  node  are  "original"  nodes. 
Meertens'  "second  circularity  check"  [74,  p.  272]  can  be  translated  into  a  circularity  check  for 
algorithm  A  as  follows.  For  any  arrow  graph  Gj  in  an  execution  {Gi, . . .  ,Gi, . . .)  of  algorithm 
A,  let  us  define  a  directed  graph  CVj  =  {N,Ei)  where  N  is  the  set  of  original  nodes  (i.e.,  the 
nodes  in  Gi)  and  {n,n')  6  E,  if  and  only  if  there  are  nodes  m,m'  in  G;  such  that  n  is  a 
source  of  m,  n'  is  a  source  of  m'  and  m  is  a  parent  of  m' .  If,  for  some  G,-,  the  digraph  CVi 
contains  a  proper  cycle  (i.e.,  nodes  n-i, . . .  ,ni,,k  >  2,  such  that  rii  =  ni,,[ni,ni^i)  6  Ei  for 
1  <  i  <  fc  —  1,  and  the  nodes  Ui,. . .  ,ni,_i  are  pairwise  distinct),  then  terminate  the  execution 
and  signal  unsolvability.  Clearly,  this  criterion  subsumes  our  extended  occurs  check  since  every 
time  the  occurs  check  is  applicable  and  reduction  to  D  takes  place,  this  circularity  check  is  also 
applicable  and  unsolvability  —  reduction  to  □  —  is  indicated. 

This  algorithm  is  sound  in  the  sense  that  whenever  it  produces  a  normal  arrow  graph  that  is 
not  D,  the  input  SEI  is  solvable.  Furthermore,  by  analogy  with  Meertens'  proof  of  termination, 
algorithm  A  with  the  extended  occurs  check  replaced  by  the  above  circularity  check  is  uniformly 
terminating.  Unfortunately,  though,  the  circularity  check  is  too  restrictive,  and  the  resultant 
algorithm  is  incomplete:  there  are  solvable  SEI's  that  "trigger"  the  circularity  check.  A  simple 
example  is  the  SEI 

S  =  {g{x)  =  y,g{z)  <  x,g{z)  <  y) 

where  g  is  a.  unary  functor.'^  It  is  clearly  solvable,  the  substitution  ti  =  {z  h- »  g{z'),  y  >—*  g{g(z'))} 
being  a  most  general  semi-unifier,  yet,  since  z  is  source  of  both  z'  and  g{z')  in  this  semi-unifier, 
the  GV-graph  contains  a  proper  loop  from  (the  node  containing)  z  to  z  itself. 

This  SEI  can  be  translated  into  a  Pure  B  program  via  the  encoding  above  and  submitted  for 
type  checking  by  the  B  type  inference  system.  According  to  the  typing  discipline  described  in 
[74]  and  partially  formalized  by  the  typing  rules  for  Pure  B  in  Table  4.4,  the  resulting  program 
should  be  considered  type  correct,  but  the  type  inference  algorithm  with  the  "second  circularity 
check"  should  flag  a  type  error.  At  present  we  have  not  reconfirmed  this  with  the  locally  available 
B  interpreter. 


Recall  &oin  chapter  4,  section  4.2  that  we  can  claim  the  existence  of  a  functor  of  any  arity  in  any  nonlinear 
alphabet. 
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Chapter  5 

Algebraic  Properties  of 
Semi-Unifiers 


In  chapter  4  we  saw  that  semi-unification  characterizes  type  inference  both  in  the  Mycroft  Cal- 
culus and  the  Flat  Mycroft  Calculus.  Since  semi-unification  has  a  simpler  definitional  structure 
than  any  of  the  typing  problems  we  will  shift  our  attention  in  designing  algorithms  to  semi- 
unification.  This  is  justified  since  every  algorithm  for  semi-unification  yields  a  type  inference 
algorithm,  and  vice  versa.  Since  SEI's  have  potentially  many  solutions  it  is  not  a  priori  clear 
which  one  of  the  solutions  an  algorithm  should  compute.  Naturally  we  expect  to  find  an  analog 
of  the  principal  typing  property  for  semi-unification:  that  every  solvable  SEI  has  a  most  general 
semi-unifier  that  is  unique  in  some  sense.  In  this  chapter  we  shall  see  in  which  sense  there  are 
indeed  unique  most  general  semi-unifiers  —  and  in  which  there  are  not. 

A  correct  treatment  of  the  algebraic  structure  of  semi-unifiers  —  solutions  of  term  inequal- 
ities —  is  trickier  than  is  apparent  at  first  sight.  This  is  evidenced  by  technically  incorrect 
treatments  and  statements  in  the  literature  [88,  15].  In  this  chapter  we  present  some  results  on 
the  algebraic  structure  of  semi-unifiers.  Our  main  goal  is  to  convince  the  reader  that,  in  the 
same  fashion  in  which  itronj  equivalence  classes  of  idempotent  substitutions  (see  below  for  def- 
initions) characterize  the  solutions  of  term  equations  and  vice  versa  (see  theorem  17),  the  weak 
equivalence  classes  of  all  substitutions  characterize  the  solutions  of  term  inequalities  and  vice 
versa  (see  theorem  18).  In  particular,  we  cannot  replace  "strong"  by  "weak"  in  this  statement. 
Two  substitutions  a^  and  <T2  are  strongly  equivalent  if  there  are  substitutions  a  and  a'  siich  that 
Q  o  a'  =  t,  where  t  denotes  the  identity  substitution,  and  a  o  a  =  a'.  Strong  equivalence  is  the 
preferred  formalization  of  the  common  phrase  "equivalent  up  to  renaming  of  variables"  [14,  63J. 
We  will  show  that,  unlike  term  equations,  term  inequalities  do  not  have  most  general  solutions 
that  are  unique  modulo  strong  equivalence.  A  natural,  weaker  notion  of  equivalence,  however, 
admits  unique  most  general  solutions  and,  more  generally,  induces  a  complete  lattice  onto  the 
set  of  all  solutions  of  term  inequalities. 

5.1      Generality  of  Substitutions 

Henceforth  let  W  denote  an  firbitrary,  but  fixed  subset  of  V. 

Definition  10   (Generality,  W -equivalence,  strong  equivalence) 
The  preordering  <\y  o/ generality  on  5"  over  W  is  defined  by 

o'l  <w  o'2  <=>  (3p  £  '5")  (po  cr)  \w-  c?  |w  • 
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The  equivalence  relation  ^w  on  5"  over  W  is  defined  by 

0"l    —w  (T2   O  (Ti    <  w   (Tj  A  (7j   < M'   ci  • 

for  all  (Ti,(T2  £  <S".  We  write  ax  <w  <^2  •/ <^1  ^w  <^2>  ^u'  '''i  5^w  0'2-  -for  any  cr  G  5", 
[(t]h'  denotes  the  ^w-equivalence  class  of  a  in  S" .  The  equivalence  relation  ~w  is  called  W- 
equivalence;  ifW=V,  it  is  called  sitong  equivalence. 

If  <ri  <;v  (Tj  we  say  that  o-j  is  at  least  as  general&s  (Tj  on  W.  E.  g.,  for  (Ti  =  {x  »— >  /(x)},  aj  = 
{x  t— ►  /(y)},  W  C  V'  —  {y},  the  substitution  ai  is  at  least  as  general  as  ctj  on  V,  but  <T2  is  only 
at  least  as  general  as  (Ti  on  W,  not  on  K.  Consequently,  <Ti  and  ctj  are  VV-equivalent,  but  not 
strongly  equivalent.  For  M  <  N  there  is  a  unique  most  general  substitution  p,  called  the  quotient 
substitution  of  N  over  M  such  that  p{M)  =  iV.  We  shall  denote  p  by  N/M . 

Solutions  of  SEI's  (semi-unifiers  and  unifiers)  are  closed  with  respect  to  "reasonable"  substi- 
tution equivalences.  More  precisely  we  have 

Proposition  21   IfV[S)  IZ  ^V  C  V  then  for  any  <ti  and  a^  such  that  <ri  =w  o-j  we  have 

1.  <ri  e  U(S)  O  0-2  e  U(5) 

2.  0-1  e  USU(5)  O  0-2  £  USU(S) 

3.  <Ti  G  SU(S)  O  <T2  6  SU(5) 

Thus  the  solutions  of  any  SEI  S  are  closed  with  respect  to  equivalence  relation  =w  as  long 
as  W  contains  at  least  all  variables  occurring  in  S,  and  every  unifier/uni.orm  semi-unifier/semi- 
unifier  can  viewed  as  (a  representative)  of  a  whole  equivalence  class  of  solutions. 

As  in  the  case  of  terms,  the  preordering  <vv  induces  a  partial  order  on  S'^ /^„  =  {[cr]w  |  ff  G 
S"},  denoted  also  by  <w  ■  Since  the  definitions  of  term  subsumption  (T",  <)  and  of  generality 
of  substitutions  (S'^,<w)  appear  analogous  we  can  ask  whether  an  analog  of  theorem  4  holds 
for  substitutions.  The  answer  to  this  question  is  three  quarters  positive,  one  quarter  negative: 
the  analog  of  theorem  4,  part  1,  holds  (see  theorem  14),  but  the  analog  of.part  2  holds  if  W  is 
co-infinite  (with  respect  to  V)  (see  theorem  16)  or  A  is  linear  (see  theorem  15).  These  results 
are  presented  below. 

Eder  proved  that  (5",  <v)  is  Noetherian  [26,  corollary  2.19].  Although  it  is  not  an  immediate 
consequence  that  (5",  <iv)  is  Noetherian  where  W  is  any  subset  of  F,  Eder's  proof  can  be  easily 
adapted  to  take  care  of  this  case,  too. 

Theorem  14  {T'-'/^„,<w)  "  Noetherian  for  any  W  CV. 

Proof:     See  [26,  47,  46]. 

As  already  indicated  the  analog  of  theorem  4,  part  2  holds  \{  A  is  linear. 

Theorem  15  If  A  is  linear  then  [S"/^„,<\y)  is  a  complete  lattice. 

Here  and  later  we  shall  make  use  of  Huet's  anti-unification  algorithm  [46,  63].  The  recursive 
algorithm  mscai  on  T  x  T  is  defined  recursively  in  Figure  5.1. 

It  is  easy  to  show  that  mscai(Af,  N)  computes  a  most  specific  common  anti-instance  [63, 
theorem  5.8]. 
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Let  $  be  a  bijection  between  T  x  T  and  V. 

insciLi(M,  N)  = 
(    /(mscai(Mi,  iVj), . . . ,  mscai(Mk,  iVj,)), 

i        if  Af  =  f(Mu-  ■  ■ ,  Ma).  ^  =  /(^i.  •  ■  •  -  ^*) 
[    $(A/,iV),  otherwise 

Figure  5.1:  Anti-unification  algorithm 

Proof:  (Proof  of  theorem) 

In  view  of  theorem  14  it  is  sufficient  to  show  that  (5"/~^,<w)  is  a  lower  semi- 
lattice.  We  shall  only  show  this  here  for  W  =  V,  which  is  the  most  interesting  case 
anyway. 

For  a  finite  set  of  variables  X  the  notation  X  shall  denote  a  term  with  a  "new"  |X|- 
ary  functor  whose  arguments  are  the  distinct  elements  of  A'  in  some  order  determined 
by  X.  For  finite  X,  for  example,  (t{X)/X  is  another  way  of  writing  ct\x- 
Let  cri,a2  be  two  proper  substitutions.  We  will  first  construct  a  substitution  or  and 
then  show  that  cr  is  a  lower  bound  and  that  any  other  lower  bound  is  at  least  as 
general  as  cr. 

Define  Z  =  dom  cri  Udomo-2,.Y  =  {z  e  Z  :  (Ti{z)  =  0-2(2)}, F  =  {x  £  Z  :  <ti(z)  / 
£T2(i)},  and  let  M  =  mscai(o-i(y),  <T2(>'))-  Since  ^  is  linear  we  have  for  the  variables 
in  the  range  of  Ci  (or  o-j)  under  X,  V  =  V{<ti{X)),  that  IK^I  <  \X\  and  consequently, 
with  V"  =  (X  uy)  -  V,  that  \V"\  >  \Y\.  Thus  there  is  M'  such  that  M'  S  M  and 
V[M')  C  V"-  Now  we  can  construct  cr  as 

,   ._i  <^i(^).  ^^-^ 

'^^'''  ~  \   (M'/Y){x)     otherwise 

Notice  that  a-{Z)  <  (ri{Z)  and  thus  pi  =  <r^{Z)/a(Z)  is  well-defined.  Furthermore, 
dompi  C  Z .  Since  pi  o  o-  =  o-i  this  shows  that  cr  is  a  lower  bound  of  ax  with  respect 
to  <v-  Similarly  it  is  a  lower  bound  of  (T2. 

To  see  that  <r  is  a  greatest  (most  specific)  lower  bound  consider  another  lower  bound 
0-'  of  (ri,o-2.  Define  Z'  =  Zudomcr'.  It  is  easy  to  see  that  <t{Z')  is  a  most  specific 
common  anti-instance  of  cri(Z')  and  (T2(Z').  Consequently  6  =  a{Z')/cT'{Z')  is  well- 
defined.  Furthermore  we  have,  for  x  e  X,  a'{x)  =  (t(x)  or  V{(t'{x))  C  Z',  and,  for 
z  g  Z'  -  X,  it  is  always  the  case  that  V{er'{x))  C  Z'  since  otherwise  the  domain  of 
either  cti  or  (T2  would  have  to  contiiin  an  element  from  outside  Z'  (being  that  o-  is  a 
lower  bound  of  both  of  them  by  assumption);  but  this  cannot  be  as  Z'  contains  the 
domains  of  both  cti  and  <tj  by  construction.  This,  in  turn,  implies  that  dom  6  C  Z 
and  thus  5  o  o-'  =  <r,  which  is  to  say  <t'  <v  o". 

For  nonlinear  A  this  structure  theorem  fails  in  a  major  way  if  W  is  co-finite:  S"/=-„  is 
neither  a  lower  nor  an  upper  semi-lattice  under  the  partial  order  <w  if  |  V  —  W\  <  00.  This  shall 
be  proved  in  the  following  two  propositions. 

Proposition  22  For  nonlinear  A,  if  W  is  co-finite,  \V  -  W\  <  00,  then  there  is  a  pair  of 
substitutions  o'i,<T2  £  S  with  two  minimal  upper  bounds  Vi,V2  E  S  with  respect  to  <w  such  that 
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Proof:     Eder  [26]  shows  that  the  pair  of  substitutions 

{x  K-.  f{x,f{y,z)),y^  f{x,f{y,z)),z  i—  f{x,f{y,z))} 
and 

{*►-*  /(/(•«,y),2),y  I— /(/(2,y),2),2  '-*/(/(•«,  y)> 2)} 

has  an  infinite  set  of  minimaJ  upper  bounds,  but  no  least  upper  bound  with  respect 
to  <v- 

A  simple  generalization  of  Eder's  pair  will  do  the  trick.  Let  W  be  a  co-finite  set. 
Without  loss  of  generality  we  can  assume  that  V  —  W  =  {wi , . . . ,  Wn}  for  some  n 
and  that  {zi, . .  . ,  Zn  +  i ,  yi,  •  ■  • ,  Vn  +  i,  ^i,  •  •  -.Zn+i}  is  a  subset  oi  W .  Now  with  p<  = 
{xi  I—  f{xi,f(yi,Zi)),yi  1—  f{xi,f{yi,Zi)),Zi  >-*  f{xi,f{yi,Zi))}  and  <7j  =  {z<  i-» 
f{f{xi,yi),2i),yi  !-►  f{f{xi,yi),Zi),Zi  1—  f{f(xi,yi),Zi)}  consider  the  substitutions 

P  =  U,g{i,...,„+i}Pj 


1 

The  minimal  upper  bounds  of  p  and  (T  are  the  substitutions 

U,e{i „+i}     {xi  1—  /(/(«,■,<,),  /(ui,  Vi)), 

yi  >-*  /(/(«i,<0, /("<,"<)). 

^t   I—  f{f{»i,U),f{^i,Vi))} 

for  pairwise  distinct  variables  W'  =  {<i,  <i,  Ui,  Ui,  • .  • ,  «n  +  ii  <n  +  ii '"n  +  ii  ^n  +  i}-  Con- 
sider one  such  minimal  upper  bound,  say  r^.  Simple  counting  shows  that  there  must 
be  some  variable  w  G  W  such  that 

W  i  {Wi,  ...,Wn,Xi,...,  Xn  +  l,yi,.  ■  ■  ,  yn  +  1,  ^l,  •  ■  ■  ,  ^n  +  l}- 

Thus  w  is  in  W.  If  we  consider  another  minimal  upper  bound,  T2,  with  range  variables 

^(T-2({a!l,  •  .  .  ,  2n  +  l,  yi yn  +  l,  «l.  •  •  •  >  2n  +  l})) 

disjoint  from 

V(Ti({zi,...,z„  +  i,yi,...,y„+i,2i,...,2„+i})), 
then  it  is  clear  that  ti  ^w  tj  because  w  ^  V(r2(dom  (T2))). 


'More  formally,  p  =  pi  o  . . .  opn-|.i  and  <t  =  <Ti  o  . . .  o  (Tn+i  ■  Since  the  order  of  composition  is  insignificant  the 
informal  set  union  operation  on  the  canonical  representations  of  the  p^'s  and  ai's  is  well-defined. 

59 


This  shows  that  {S"/^„,<w)  is  not  an  upper  semi-lattice  for  \V  -  W\  <  oo.   We  can  also 
show  that  it  fails  to  be  a  lower  semi-lattice. 

Proposition  23   For  nonlinear  A,   if  W   is  co-finite,   \V  -  W\   <  oo,   then  there  is  a  pair  of 
substitutions  o-i.crj  6  S  with  two  mazimal  lower  bounds  Vi.uj  G  S  with  respect  to  <w  such  that 

Proof: 

We  shall  only  treat  the  case  W  -V.  The  general  case  is  a  generalization  analogous 
to  the  previous  proof. 

Let  j/i, . . . ,  2/4,  ij, . . . ,  Z4  be  eight  pairwise  distinct  variables  and  let  /  be  an  arbitrary 
functor  with  arity  2.^  Consider 


o-i    =    {^i '-*/(/(yi,y2)./(y3,y4))> 
zj  •-»  f{f{yi,y2),f{y3,yi)), 
X3  •-►  fifiviiVi),  f{yz,  yi))) 

and 

(72   =   {Zl  »-*  /(/(zi,22),/(23,^4)), 
Xi   '—  /(/(2l>^2),/(23,^4)), 
Z3l-»/(/(2l,22),/(23,24))}. 

Both 

Vl    =   {Zl  l-»  /(2;i,/(z2,23)), 

Z2  >—  /(a:i,/(z2,23)), 
Z3  •—  /(*i>/(a:2,Z3))} 

and 


t;,   =   {«1  i-»  /(/(«!,  Z2),  23), 
«2  ^  f{f{xi,X2),^3), 
Z3  >—  /(/(«l>Z2),a!3)} 

are  maximal  lower  bounds  since,  at  first  sight  maybe  somewhat  unexpectedly, 

{ll,X2,Z3  >-►  f{f(vi,V2),f{v3,V^))} 

does  not  form  a  lower  bound  of  ui  or  (T2  for  any  variables  ui, . . . ,  U4.  Clearly,  vi  and 
V2  are  not  equivalent  under  ^iy. 


^Note  that  there  must  be  at  least  one  fuiictoi  with  arity  at  least  2  since  we  assume  that  (F,a)  is  nonlinear  in 
this  section;  w.  1.  o.  g.  we  can  assume  that  F  contains  a  functor  with  arity  exactly  2. 
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The  reason  for  this  "misbehavior"  of  [S"" /^„,  <w)  for  co-finite  W  is  due  to  the  fact  that  we 
cannot  "hide"  enough  variables  from  "consideration"  under  <w  if  there  is  not  enough  "room" 
in  V  —  W.  For  subsets  W  of  V  that  leave  "enough"  variables  hidden  in  V  —  W  —  for  co-infinite 
Ws  —  the  partial  orders  (J^/^-^^,  <w)  have  indeed  a  lattice  structure.  The  proof  of  this  is  a 
consequence  of  the  more  general  theorem  18  proved  in  section  5.2. 

Theorem  16  For  nonlinear  A  the  following  statements  are  equivalent. 

•  (5"/2„,<vv)  is  a  complete  lattice. 

•  W  is  co-infinite;  that  is,  \V  —  W\  =  oo. 

Henceforth  we  shall  deal  almost  exclusively  with  nonlinear  alphabets.  As  we  have  already  seen 
the  theory  of  substitutions  and  (semi-)unifiers  is  very  different  for  linear  and  nonlinear  alphabets. 
In  fact,  the  case  of  term  inequalities  over  linear  alphabets  is  algebraically  and  computationally 
much  simpler  than  for  nonlinear  alphabets.  It  is  treated  in  [15]  under  the  name  prefix  inequalities. 

5.2      The  Structure  of  Semi-Unifiers 

It  is  often  quoted  that  most  general  unifiers  are  unique  "up  to  renaming  of  variables".  As  pointed 
out  in  [63]  there  are  several  dijiinct  notions  of  what  this  innocuous-looking  little  phrase  can  be 
taken  to  mean.  The  most  commonly  used  notion  is  strong  equivalence  (i.e.,  equivalence  modulo 
^v)-  While  different  notions  lead  to  a  slightly  different  structure  of  unifiers  for  a  given  system 
of  equations,  they  all  admit  the  existence  of  most  general  unifiers,  though  most  general  unifiers 
with  respect  to  one  notion  (e.g.,  [113])  are  not  necessarily  most  general  with  respect  to  another 
equivalence. 

The  fact  that  there  are  most  general  unifiers  under  any  of  the  different  notions  of  renaming 
may  have  prompted  Chou  to  write  that,  similarly,  "it  is  evident"  that  the  most  general  semi- 
unifier  of  an  SEI  is  unique  modulo  strong  equivalence,  if  it  exists  at  all  [15,  page  11].  The 
breakdown  in  the  analogy  of  the  structure  of  T/a.  and  S/a,y  (see  theorem  16  and  the  discussion 
before  it),  however,  already  suggests  that  this  claim  may  not  be  true  in  general,  and,  indeed, 
it  is  incorrect.^  A  weaker  notion  of  equivalence  (see,  e.g.  [113,  chapter  4]),  however,  admits 
the  existence  of  most  general  semi-unifiers  and  an  equivalent  to  the  main  structure  theorem  for 
unifiers. 

5.2.1      Strong  Equivalence 

Strong  equivalence,  S^i  corresponds  to  renaming  of  substitutions  by  composition  of  permutation 
substitutions;  i.e.,  by  substitutions  a  for  which  there  is  a~^  such  that  a  o  a"^  =  a~^  o  a  =  l. 
Two  substitutions  ai  and  (Tj  are  strongly  equivalent  if  and  only  if  there  is  such  a  permutation 
substitution  q  such  that  aotri  =  ctj.  Strong  equivalence  has  attracted  a  lot  of  attention  because 
of  its  close  connection  to  idempotent  substitutions,  which  in  turn  are  strongly  related  to  systems 
of  equations. 

In  this  subsection  the  terms  "minimal"  and  "most  general"  always  refer  to  <v- 


^  We  feel  tempted  to  say  that,  in  view  of  theorem  16,  muqueness  of  most  general  unifien  with  respect  to  strong 
equivalence  is  a  "lucky  coincidence";  or,  less  dramatically,  a  very  specific  property  of  unification  that  cannot 
simply  be  "transferred"  to  other  problems;  or,  in  more  neutral  terms,  an  outgrowth  of  the  fact  that  the  theory  of 
unifiers  can  be  viewed  as  a  representation  theory  for  idempotent  substitutions,  which  indeed  form  a  lattice  with 
respect  to  <v  [26,  theorem  4.9]. 
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Strong  Equivalence  and  Unifiers 

We  recapitulate  the  most  important  result  on  the  structure  of  unifiers  modulo  strong  equivalence 
from  [26]  (see  also  [63]).  Note  that  every  SEI  has  a  minimal  unifier  .*  This  follows  immediately 
from  theorem  14.  We  call  a  minimal  unifier  o-  of  5  a  most  general  unifier  of  S  if  for  all  unifiers 
u  of  S  there  is  a  substitution  p  such  that  p  oa  —  v. 

A  substitution  a  is  idempotent  if  it  satisfies  a  o  a  -  a.  The  set  of  proper  idempotent 
substitutions  is  denoted  hy  IS  [A,  V)  (or  just  IS),  and  the  set  of  all  idempotent  substitutions  is 
denoted  by  IS"(A,  V)  (or  simply  J5").  The  significance  of  idempotent  substitutions  and  their 
relation  to  unification  is  summarized  in  the  main  structure  theorem  of  systems  of  equations. 

Theorem  17  1.  Every  system  of  equations  S  has  a  most  general  unifier  that  is  idempotent, 
and  for  every  idempotent  substitution  ct  there  is  a  system  of  equations  S'  such  that  a  is  a 
most  general  unifier  of  S'  (uiith  respect  to  <v)- 

2.  {{IS'^  n  \J{S))/^^r,  <v)  "  a  complete  lattice  for  every  system  of  equations  S. 
Proof:     By  refinement  of  the  proof  of  theorem  4.9  in  [26]. 

Since  there  are  substitutions  that  are  not  strongly  equivalent  to  any  idempotent  substitution, 
we  have  as  a  consequence  of  theorem  17  that  there  are  substitutions  in  S  that  are  not  most 
general  unifiers.  For  example,  {zi  h-  /(2i),...,z„  i-»  /(zn)}  is  not  strongly  equivalent  to  any 
idempotent  substitution. 

Part  1  of  theorem  17  expresses  not  only  that  every  system  of  equations  has  a  most  general 
unifier,  but  that  there  is  always  an  idempotent  most  general  substitution.  An  instance  of  the 
theorem  is  Eder's  original  structure  theorem  for  idempotent  substitutions. 

Corollary  24  {IS" /^^r,<v)  "  o  complete  lattice. 

Proof:     Consider  S  =  ()  in  theorem  17. 

Strong  Equivalence  and  Semi-Unifiers 

The  set  of  idempotent  unifiers  of  any  system  of  equations  forms  a  lattice.  The  fact  that  every 
system  of  equations  has  an  idempotent  most  general  unifier  justifies  in  some  sense  the  restriction 
of  consideration  to  idempotent  substitutions  and  unifiers,  as  is  done  from  the  outset  in  [105]. 

In  this  subsection  we  show  that  idempotent  substitutions  and  strong  equivalency  fail  to 
capture  the  structure  of  semi-unifiers  in  a  major  way;  namely, 

1.  for  any  SEI  5  neither  U(S)  nor  USU(5)  nor  SU(5)  induce  a  lower  or  upper  semi-lattice 
(under  <v)- 

2.  there  are  systems  of  equations  and  inequalities  that  have  a  most  genercd  semi-unifier,  but 
no  idempotent  one; 

3.  there  are  systems  of  equations  and  inequalities  with  no  most  general  semi-unifier; 

Proposition  25  For  nonlinear  A  neither  (U(S)/5-^,,  <v)  nor  (USU(S)/5;j,,  <v)  nor 
(SU(S)/=;j,,  <v)  forms  a  lower  or  upper  semi-lattice  for  any  SEI  S. 

Proof:     Almost  directly  from  the  proofs  of  propositions  22  and  23. 


*A  unifier  <t  of  an  SEI  S  ii  miniinBl  if  for  every  other  unifier  er'  of  5  it  holds  that  <r'  <  <t  =>  cr  <  ff'. 
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Proposition  26   For  nonlinear  A  there  is  an  infinite  family  of  SEI's  S\, . . . ,  Si, . . .  such  that, 
for  all  i  £  A/*,  Si  has  uniform  and  nonuniform  minimal  semi-unifiers  (Th  and  ai2,  but  an  ^y  <^i2- 

Proof: 

Consider  Si  =  (/(^i .  •  •  • .  *t)  ^  3/)-  '^^^  substitutions 


and 


o-u  =  {y  !-►  /(ui,...,Ui)} 


'•,2  -  {y^  f{v\.,--,Vi)} 


are  minimal  semi-unifiers  of  Si  since  only  for  p  =  {}  we  have  p  <v  an  ot  p  <v  cr^j 
and  {}  is  not  a  semi-unifier  of  Sj.    But  there  is  no  substitution  q  G  S  such  that 

a  O  (Til    =  "'12    or   Q  O  (Ti2   =  (Til. 

Proposition  27  There  is  an  infinite  family  of  SEI's  Si, . . . ,  Si,  . .  .  such  that,  for  all  i  £  A/",  Si 
has  a  most  general  uniform  and  nonuniform  semi-unifier,  but  no  idempotent  one. 

Proof: 

Consider  5,-  =  (/(yi)  <  zi, . . . ,  f(yi)  <  Zi).  The  substitution 

(Ti   =  {zi  H-.  /(zi),  .  .  .  ,  2i  H-  /(zi)} 

and  its  Sv'-equivalent  substitutions  are  the  only  most  general  uniform  and  nonuni- 
form unifiers  of  5j.  As  we  remarked  earlier  there  is  no  idempotent  substitution 
amongst  them. 

The  reason  why  5",  U(5),  USU(S),  SU(S)  fail  to  be  lattices  under  <v  are  intuitively  rather 
pathological  and  cast  some  doubt  on  the  appropriateness  of  choosing  strong  equivalence  as  the 
"proper"  notion  of  renaming  on  substitutions  for  semi-unification. 

5.2.2      Weak  Equivalence 

In  this  section  we  define  an  equivalence  relation  on  substitutions  relative  to  systems  of  equations 
and  inequalities  that  is  properly  weaker  than  strong  equivalence.  We  will  show  that  this  relation, 
weak  equivalence,  ties  general  substitutions  and  systems  of  equations  and  inequalities  together 
just  as  strong  equivalence  ties  idempotent  substitutions  and  systems  of  equations  together  (the- 
orem 17). 

DeRnition  11   (Weak  equivalence) 

Substitutions  ai  and  (J2  are  called  weakly  equivalent  with  respect  to  SEI  S  (or  simply  S- 
equivalent^  i/<Ti  — v(s)  ^2- 

A  Jt-ary  context  CQ  is  a  term  with  k  "holes"  in  it  such  that  C[Mx,  ■ .  ■ ,  Mk]  is  the  (complete) 
term  with  the  terms  Mi, . . . ,  Mk  in  place  of  the  holes  in  C\}.  More  formally,  a  fc-ary  context  is  a 
term  CQ  6  T{A,  V  U  MV)  where  MV  is  a  At-element  set  {yi,  ■  •  • ,  y*}  of  meta-variables  disjoint 
from  V  and  F.  For  subsitution  a-  :  VuMV  \->  T{A,  V),<r  -  {yi  i->  Mi, ...  ,yi,  >-*  Mi,}  the  result 
of  applying  a  to  C  is  denoted  by  C[Mi, ... ,  Mk]- 

Recall  that  (T^/^,  <)  is  a  complete  lattice  with  A  and  V  denoting  the  infimum  and  supremum 
operator,  respectively. 
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Lemma  28    There  is  an  operation  A  :T  x  T  >->  T  such  that 

1.  [M  /\N]  =  [M\  A  [N\  for  all  M,  N  eT. 

2.  C[Mi, . . . ,  Mk]  A  C[Ml^, . . . ,  M^]  =  C[Mi  A  M[, . . . ,  Mk  A  Mj,]  for  all  k,  k-ary  contezls  C, 
and  terms  Mi , . . . ,  Mk  and  A/( , . . . ,  Mj, . 

Proof: 

1.  Consider   the  anti-unification  algorithm  in   Figure  5.1   and  define  M  A  N    = 
mscai{M,N).  (See  [46];  see  also  [63].) 

2.  The  definition  of  A  has  the  property /(M)  A /(iV)  =  f(MAN)  for  every  functor 
/.  The  result  follows  by  structural  induction  on  CQ. 

For  every  operation  that  satisfies  lemma  28,  part  1,  the  following  proposition  holds. 

Proposition  29  For  all  terms  A/i ,  Mj ,  T^i ,  iVj  £  T  such  thai  Mi  <  Mj  and  Ni  <  N2  it  holds 
that  Ml  A  Ni  <  Af 2  A  N2 . 

For  any  SEI  S,  we  call  a  (uniform)  semi-unifier  o-  of  S  a  most  jenera/ (uniform)  semi-unifier  of 
S  if  for  all  (uniform)  semi-unifiers  u  of  5  there  is  a  substitution  p  such  that  (pocr)  |v(5)=  ^  \v{s)- 
Similarly,  from  now  on  a  unifier  of  5  will  be  called  most  general  if  it  is  minimum  with  respect 
to  <v(s)  instead  of  <v  as  in  the  previous  section. 

Now  we  are  ready  to  prove  the  main  theorem  of  this  section. 

Theorem  18  1.  Every  system  of  equations  and  inequalities  S  has  a  most  general  (uniform) 
semi-unifier,  and  for  every  substitution  a  there  is  a  system  of  equations  and  inequalities 
S'  such  that  a  is  a  most  general  (uniform)  semi-unifier  of  S . 

2.   (SU(S)^,,,,,  <K(s))  (as  well  as  (USU(S)/5!^(,),  <v(S))  and  ^{5)/^^^^^)  is  a  complete 
lattice  for  every  system  of  equations  and  inequalities  S. 

As  an  immediate  consequence  we  have 

Corollary  30   Every  solvable  SEI  S  has  a  most  general  idempolent  semi-unifier. 

Proof:     (Proof  of  corollary) 

Take  a  most  general  semi-unifier  a  of  S.  If  V{S)  =  {xi, . . . ,  z„}  define  a-'  =  {xi  1— » 
x'l,..  .,Xn  1— ►  x'^}  o  CT  where  x'l,.. .  ,x'^  are  pairwise  distinct  variables  not  occurring 
in  S.  Then  a'  is  idempotent  and  a  most  general  semi-unifier  of  S. 

The  theorem  can  be  strengthened  and  still  holds  if  we  replace  =v(S)  (weak  equivalence)  and 
<vis)  by  —IV  and  <w,  respectively,  where  W  is  any  co-infinite  subset  of  V  containing  V{S). 
This  strengthened  version  of  theorem  18,  part  2,  implies  theorem  16  (let  S  =  ()). 

Proof:     (Proof  of  theorem) 

For  part  2,  since  every  complete  semi-lattice  is  automatically  a  complete  lattice  and 
since  every  Noetherian  lower  semi-lattice  is  a  complete  lower  semi-lattice,  it  is  suffi- 
cient to  show  that  {S\J{S)/<^y^g.,  <v(s))  is  a  lower  semi-lattice. 

Let  (Ti  and  o-j  be  semi-unifiers  of  5.  Let  Xi,...,Xk  be  the  set  V[S)  of  variables 
occurring  in  5.  Denote  o'i(zi)  by  Mi  and  <T2{xi)  by  iVj  for  1  <  »  <  k.  Now  define 
(T  =  {zi  I— »  Ml  A  Ni,..  .,Xk  I— •  Mk  A  iVj}  with  A  defined  as  in  lemma  28. 
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First  we  show  that  cr  is  a  semi-unifiet  of  S.  Without  loss  of  generality  (see  proof 
of  proposition  1)  we  can  assume  that  5  consists  of  one  equation  and  n  inequalities. 
There  are  contexts  Co,  Ci, . . . ,  C„  and  Cq,  C[,  . . .  ,C'„  such  that  S  is  equal  to 

{    Co[xi,...,Xk]     =     C'o[xi,...,Xk]   }  (equation) 

r    Ci[xi,...,Xk]      <      C[[xu...,xm]    ] 

<  ...  >      (inequalities) 

I    Cn[xi,...,^k]      <      C'^[xu...,x,,]    J 

By  assumption  both  <Ti  and  crj  are  semi-unifiers  of  S,  and  so 

Co[Mi,...,A/*]  =  C'o[Mi,...,Ah] 

Ci[Mi,...,M*]  <  C;[A/i,...,AM 

Cr,[Mu...,Mk]  <  C;[Af,,...,A/t] 
holds  as  well  as 

Co[Nu...,N^]  =  C^[Nu...,N,] 

C,[Nu...,N,]  <  C[[Nu--.,N,] 

Cn[Nu...,N,]  <  Ci,[N„...,N,] 
By  proposition  29  this  implies  that 

Co[A/i,...,Mi]ACo[Ari,...,^t]  =  C'o[Mi,...,Ah]AC'o[Nu.-.,N,] 

Ci[M„...,M,]AC,[N,,...,Nu]  <  C[[Mi,...,Ah]AC[[Nu...,Nu] 

holds,  and  by  lemma  28,  part  2,  we  conclude  that 

Co[M,ANu...,Mk^N^]     =     C'o[Mi  A  Ni,  ■ .  ■ ,  Mk  ^  N^] 

C,[MiANi,...,M^AN^]      <      C[[MxAN, A/*  A  iV^] 

C„[MiANi,...,MmAN,,]      <      C'^[Mi  A  Nu  . . . ,  M^  A  Nk] 

holds  true.  This,  in  turn,  shows  that  <r  is  a  semi-unifier  of  5. 

We  now  show  that  any  other  semi-unifier  </  that  is  a  lower  bound  of  both  (Ti  and  (Tj 
is  also  a  lower  bound  of  cr.  Define  »■'(*«)  =  Li  for  1  <  »  <  fc.  Since  cr'  is  a  lower  bound 
of  <Ti  (with  respect  to  <v(s))  it  holds  that  [Li, . . . ,  L»]  <  [Afi, . . . ,  Mj,]  for  some  arbi- 
trary functor  [. . .]  written  in  bracket-notation;  similarly,  [ii , . . . ,  Lk]  <  [Ni  ,.■■ ,  N/,]. 
Consequently,  [Li,..  .,Li,]  <  [Mi, . . . ,  Mi,]  A  [Ni, . . . ,  Nk]  and,  by  lemma  28,  part 
2,  we  have  [Li, . . .,  L),]  <   [Mi  A  Ni,...,Mk  A  Nk];  i.e.,  there  is  a  substitution  p 
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such  that  p{[Li,...,Lk]  =  [Ml  A  Nu...,Afk  A  Nk].  But  this  immediately  implies 
p{a-'{xi))  =  o-(z<)  for  1  <  i  <  fe,  and  thus  a'  <v[S)  <^- 

Part  2  implies  one  half  of  part  1,  that  every  system  of  equations  and  inequali- 
ties has  a  most  general  semi-unifier.  Conversely,  let  a  be  an  arbitrary  substitu- 
tion. U  a  =  u!  then  clearly  o"  is  a  most  general  semi-unifier  of  {/(«)  =  x}.  If 
o-  =  {zi  ^-  Mi,...,Z)k  h-  A/jk}  let  a  =  {zi  >—  z[,...,Xk  ^  x'J  where  x[,...,x'^ 
are  pairwise  distinct  variables  disjoint  from  zi,...,z».  Now  define  S  =  {q(A/i)  < 
Xi,...,a{Mk)  <Xk}-  Clearly,  cr  is  a  most  general  semi-unifier  of  S. 

There  are  more  constructive  proofs  of  the  uniqueness  of  most  general  semi-unifiers  modulo 
weak  equivalence,  but  they  do  not  yield  the  powerful  structure  theorem  18.  The  algorithmic  spec- 
ifications (functional,  rewriting,  and  graph-theoretic)  for  computing  most  general  semi-unifiers 
in  chapter  6,  for  example,  can  be  turned  independently  into  proofs  of  the  existence  of  most 
general  semi-unifiers.  In  fact  their  proofs  of  correctness  constitute  alternative  proofs,  although 
additional  care  is  necessary  since  the  specifications  may  not  be  uniformly  terminating. 

5.3      The  Structure  of  Typings  and  Typing  Derivations 

It  is  interesting  that  the  main  structure  theorem  for  semi-unifiers,  theorem  18,  yields  a  "simul- 
taneous" proof  of  the  principal  typing  property  of  CH,  DM,  MM,  and  FMM  via  the  reduction  in 
theorem  8  of  chapter  4.  Something  even  stronger  can  be  said  about  typings  and  their  derivations 
in  the  syntax-oriented  versions  of  our  type  disciplines  since  the  reduction  in  theorem  8  translates 
every  typing  derivation  into  a  solution  of  the  corresponding  semi-unification  problem  instance. 
Consider  a  substitution  on  monotypes,  5  :  TV  —^M.S  can  be  applied  to  a  polytype 
(T  by  simultaneously  replacing  only  the  free  variables  in  cr  all  the  while  renarr^^ng  bound  type 
variables  in  a  to  avoid  capture  of  (necessarily  free)  type  variables  from  S.  Such  a  substitution 
can  thus  be  extended  to  type  assignments,  S{A){x)  =  S{A{x)),x  £  dom  A,  to  typings,  S{A  D 
e  :  a)  —  S{A)  D  e  :  S(a)  and  to  whole  derivation  trees.  We  can  also  extend  the  generic  instance 
preordering  on  polytypes  of  chapter  2,  <Ti  C  0-2,  to  type  assignments  hy  A  Q  A'  O  (Vz  S 
domA)  A{x)  C  A'{x).  Finally,  we  define  the  relation  {A  D  e  :  (t)  <  [A'  D  e'  :  </):  it  holds  if  and 
only  if  there  is  a  substitution  S  such  that 

1.  S{A)  C  A', 

2.  e  =  e', 

3.  S{(r)  Q  (t'. 

Finally  for  two  proof  trees  (in  a  fixed  typing  calculus),  P  and  P',  we  define  P  <  P'  \i  P  and  P 
are  structurally  isomorphic  and  there  is  a  substitution  5  such  that  (A  D  e  :  cr)  <  (A'  D  e  :  cr  ) 
holds  for  every  corresponding  pair  of  typings  (A  D  e  :  cr)  £  P  and  (A'  D  e'  :  </)  £  P' .  Clearly  < 
defines  a  preorder  that  induces  canonicaUy  a  partial  order,  also  denoted  by  <. 

Corollary  31   Lei  X  =  CH,  DM,  MM,  or  FMM,  and  let  X'  denote  the  syntax-oriented  version 
of  X.  For  any  expression  e  £  A  that  is  X-typable, 

1.  the  set  of  derivable  typings  for  e  in  X,  respectively  X',  forms  a  complete  lattice  w.r.t.   the 
partial  order  <  on  typings; 

2.  the  set  of  all  proof  trees  for  e  in  X'  forms  a  complete  lattice  w.r.t.  the  partial  order  <  on 
proof  trees. 

66 


Proof:  Because  of  theorem  5  part  (2)  implies  part  (1).  An  inspection  of  the  proofs 
of  theorems  6  and  7  reveals  that  proof  trees  for  e  and  solutions  of  the  canonical 
system  of  equations  and  inequalities  SEI{e)  are  in  a  one-one  correspondence  and  the 
composition  of  the  two  reductions  is  strongly  monotonic  in  the  sense  that  if  P  and 
P'  are  derivations  for  e  and  S  and  5'  are  the  corresponding  semi-unifiers  of  SEI{e) 
then  P  <  P'  o  S  <S'. 

The  first  part  of  this  corollary  implies  that  there  is  a  least  typing  A  D  e  :  <r  hi  every  typable 
e.  This  can  be  read  as  a  generalized  principal  typing  property  since  it  is  not  relative  to  a 
fixed  type  assignment  [23].  The  second  part  may  have  practical  applications  in  an  incremental 
compiler:  it  should  be  quite  practical  to  maintain  the  principal  type  information  with  any  well- 
defined  program  fragment  and  perform  a  "meet"-operation  with  new  type  information  once  it  is 
available.  The  corollary  certifies  that  this  is  always  possible  (see  also  [74]). 
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Chapter  6 

Algorithmic  Specification  of 
Most  General  Semi-Unifiers 


In  chapter  4  we  showed  that  semi-unification  is  at  the  heart  of  polymorphic  type  inference  in  the 
Mycroft  Calculus.  In  chapter  5  we  saw  that  every  system  of  equations  and  inequalities  (SEI)  has 
a  most  general  semi-unifier,  which  is  unique  up  to  weak  equivalence.  In  this  chapter  we  address 
the  problem  of  computing  most  general  semi-unifiers.  It  appears  natural  to  expect  that  in  order 
to  solve  the  decision  problem  of  semi-unifiabiEty  it  is  essentially  necessary  to  compute  most 
general  semi-unifiers  since  they  represent  the  least  commitment  to  substitution  decisions.  It  is 
interesting  then  to  see  that  Kapur  et  al.  achieve  a  polynomial-time  algorithm  for  uniform  semi- 
unification  by  exploiting  a  property  that  makes  it  possible  to  "abandon"  most  general  (uniform) 
semi-unifiers  and  compute  a  more  specific  semi-unifier.  This  is  possible  because  the  more  specific 
semi-unifier  is  guaranteed  to  exist  if  and  only  if  the  most  general  semi-unifier  exists,  which  is 
the  case  if  and  only  if  there  is  any  (uniform)  semi-unifier  at  all.  This  property  does  not  hold 
for  two  or  more  inequalities,  and  hence  computing  most  general  semi-unifiers  seems  the  best 
approach  for  obtaining  a  correct  decision  algorithm  for  semi-unification.  The  functional  problem 
of  semi-unification  —  computing  a  most  general  semi-unifier  —  is  of  independent  importance 
in  its  application  in  type  inference.  In  ML,  for  example,  a  program  that  is  submitted  for  type 
checking  is  annotated  with  type  information,  its  principal  type.  We  would  also  like  to  have 
complete  type  information  for  aU  the  program  fragments  making  up  the  whole  program.  This 
amounts  to  computing  the  most  general  semi-unifier  of  the  SEI  encoding  the  typing  coftstraints 
of  the  program  and  printing  it  out  as  an  annotation  of  the  program. 

We  present  three  algorithmic  specifications  for  computing  the  most  general  semi-unifier  of  an 
SEI  in  this  chapter.  The  first  one  is  a  functional  specification  that  is  proved  partially  correct  by 
fixed  point  induction.  The  second  one  is  an  SEI-rewriting  specification  whose  partial  correctness 
follows  iom  a  soundness  and  completeness  theorem  that  shows  that  the  class  of  solutions  is  in- 
variant under  rewritings.  The  third  specification  is  a  graph-theoretic  version  of  the  SEI-rewriting 
specification.  Its  encoding  of  SEI's  by  arrow  graphs,  which  are  term  graphs  with  some  additional 
structure,  not  only  saves  execution  time  and  space  over  the  SEI-rewriting  formulation,  it  cJso 
seems  more  appropriate  for  analyzing  its  termination  properties  since  all  these  specifications 
make  use  of  a  basically  "nonlocal"  failure  criterion  called  the  extended  occurs  check.  These  three 
specifications  can  be  viewed  as  manifestations  (or  implementations)  of  one  abstract  algorithm. 
We  conjecture  that  this  algorithm  is  uniformly  terminating,  and  that  thus  semi-unification  and 
the  Mycroft  Calculus  are  decidable. 

We  have  been  implicitly  suggesting  that  it  is  acceptable  to  talk  about  type  inference  and 
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semi-unification  interchangeably.  This  informality  may  be  unwarranted  if  the  reduction  of  type 
inference  to  semi-unification,  as  in  chapter  4,  needs  to  be  done  "off-line",  as  a  proper  preprocess- 
ing step  to  semi-unification,  since  this  would  be  very  undesirable  in  an  interactive  environment. 
Fortunately,  it  is  quite  easy  to  see  that  this  reduction  can  be  done  "on-line",  just  as  lexical, 
syntax,  and  semantic  analysis  in  compilers  can  usually  be  "jammed"  to  a  large  degree.  This 
enables  compilers  to  operate  in  interactive  and  incrementable  environments.  Since  a  "direct" 
syntax-oriented  type  inference  algorithm,  based  on  algorithm  A,  is  quite  easy  to  obtain,  yet 
raises  a  diflTerent  set  of  issues  that  are  more  practical  than  those  addressed  in  this  thesis,  we  shall 
refrain  from  delving  into  details  and  only  present  algorithmic  specifications  for  semi-unification. 

6.1      Functional  Specification 

We  now  provide  a  functional  specification  of  a  most  general  semi-unifier  of  an  SEI  S,  which  we 
prove  partially  correct.  W.l.o.g.  we  may  assume  that  SEI's  have  at  most  one  equation  and  at 
most  one  inequality  per  inequality  group  and  that  the  SEI's  are  over  alphabet  A2.  We  start  with 
some  definitions  and  notational  conventions  used  later. 

Definition  12  1.  A  fc-dimensional  constraint  mapping  R  is  a  sequence  [Ri, . . . ,  Rh)  of  finite 
maps  from  V  to  T  that  are  undefined  almost  everywhere}  The  domain  D{Ri)  of  Ri  is  the 
set  of  variables  x  for  which  Ri{x)  is  defined.^  A  component  of  a  constraint  mapping  can 
be  applied  to  a  term  t  E  T  by  recursively  applying  it  to  the  suhterms  of  t  as  long  as  it  is 
defined  for  every  variable  occurring  in  r;  otherwise,  the  result  is  undefined. 

2.  Let  R  =  {Ri, . .  . ,  Ri,)  and  R'  =  [R[,...,R'^)  be  constraint  mappings;  let  Ti,T2  G  T 
be  terms.  A  substitution  U  is  an  i2-compatible  semi-unifier  in  the  »-th  dimension  (R- 
compatible  unifier)  of  ti  and  T2  via  R'  if^ 

(a)  R[{U{n))  =  U{T2){respectively,U(n)  =  {/(r^)) 

(h)  (Vj  e  {1,. .  .,k})(^x  e  D{Rj))Rr{U{z))  =  U(Rj{z)) 

In  the  mutually  recursive  function  specifications  V  (Figure  6.1)  and  U  (Figure  6.2)  we  use 
quite  standard  notational  conventions  from  both  ALGOL-like  and  functional  languages.  Some 
notations  are  specific  to  our  applications  domain,  though.  Ri{x  :  T2}  means  the  same  thing 
for  constraint  mappings  as  it  does  for  type  environments:  it  denotes  the  constraint  mapping 
identical  to  R  with  the  only  (possible)  difference  that  Ri{x  :  T2}{x)  is  rj  no  matter  whether 
Ri{x)  is  defined  or  undefined.  The  function  "new"  takes  two  arguments,  a  term  n  and  a  set  of 
variables  $.  It  returns  a  term  rj  that  is  obtained  from  Ti  by  replacing  all  variables  in  ri  with 
variables  not  in  $;  for  convenience,  it  also  returns  the  set  of  new  variables  thus  introduced.*  The 
operator  o  denotes  functional  composition. 

The  function  V  takes  five  arguments:  a  set  of  variables  $,  an  index  i  between  1  and  k,  a 
ife-dimensionai  constraint  mapping  R,  and  two  terms  Vi  and  tj.  U  takes  the  same  arguments 
except  for  the  index.  Both  functions  return  a  substitution  and  a  (new)  constraint  mapping. 
In  both  cases  the  first  argument,  $,  is  only  there  for  technical  reasons  to  facilitate  a  "true" 
functional  specification  (and  the  correctness  proof  of  the  following  lemma).    For  all  practical 


^Recall  tha.t  T  =  T{A2,V). 
Note  the  difference  between  the  domain  of  a  substitution,  which  is  deilned  everywhere,  and  a  component  of 
a  constraint  mapping,  which  is  undefined  almost  everywhere. 

The  order  of  tj  and  tj  is  significant  for  the  definition  of  i?-compatible  semi-unifiers,  but  not  for  unifiers. 
*Any  function  with  these  properties  will  do  in  place  of  "new".     In  fact,   "new"   encapsulates  the  nondeter- 
minisn\  of  the  problem  (most  general  semi-unifiers  are  only  unique  up  to  weak  equivalence)  in  this  deterministic 
specification. 

69 


v{^,i,R,n,T2)^ 

if  Ti  =  X  (variable)  then 

if  iZi(z)  is  undefined  then 
{{},Mx:r,}) 

else 

U(^,R,Ri{x),r2) 

fi 
elseif  T2  =  y  (variable)  then 

if  (3r  E  R'{y))Ti  contains  v  then 
ERROR:  occurs-check 

else 

let  (t(,$')  =  new(#,ri)  in 

let  {Ui,  Ri)  =  V(*  U  *',  i,  R,  n,T[)  in 

let  ([/2,  R2)  =  U{U^{i),  Ru  U,{y),  U,{r[))  in 

{U20UUR2) 
fi 
elseif  Ti  =  T2  —  c  (constant)  then 

elseif  n  =  /(tii,Vi),T-2  =  /(v2,V2)  then 

let  {UuRi)--V{i,i,R,vuV2)in 

let  (l/j,  i?2)  =  V{Ui(^),  i,  RuUi{i>i),  Uii^i))  in 
{U20UUR2) 
else 

ERROR:  functor  clash 
fi 


R'{y)  = 

the  least  X  \  {y}  C  X  and 

(Vz  e  A')(Vi  G  {1,  2, . . .  *})iZ<(a!)  is  a  variable 
^Ri{x)eX 


Figure  6.1:  Functional  specification  of  V 


70 


let  in,T2)  = 

if  Tj  =  y  (variable)  then 

(t-2,t-i) 
else 

(ri,r2)  in 

if  Ti  =  X  (variable)  then 
if  z  =  Tj  then 

{{},R) 

elseif  T2  contains  z  then 

ERROR:  occurs-check 
else 

iU,,R,):={{x:T2},{x:r2}{R)y, 
for  t  =  1  to  fe  do 

a  R\(x)  is  defined  then 

{Uf,  Rf)  :=  V(U,{i),  i,  Rt,  U,{r2),  U,{R,{x))); 
{U,,R,):={U,,oU„R,.); 
fi 
rof; 

{Ut,Rt) 
elseif  Ti  =  T2  =  c  (constant)  then 

elseif  Ti  =  f{vi,i>i),T2  =  /(v2,V'2)  then 
let  {Ui.Ri)  ^  lJ{^,R,vx,V2)  in 
let  (I/j,  R2)  =  £^(t/i($),  iZi,  I/i(Vi),  t^i(V'2))  in 

else 

ERROR:  functor  clash 
fi 


Figure  6.2:  Almost-functional  specification  of  U 
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purposes,  a  LISP-like  "gensym"  function  used  inside  of  the  function  "new"  would  be  sufficient 
(and  preferable).  For  simplicity  both  V  and  U  are  formulated  for  the  functors  /  and  c  with 
arities  2  and  0,  respectively. 

Without  going  into  too  much  detail  we  interpret  the  definitions  of  V  and  U  as  the  least  fixed 
points  over  suitable  flat  domains  or,  more  prosaically,  by  any  one  of  a  number  of  computation 
rules  (c.  f  [71]). 

Lemma  32  Let  n,  r2  £  T  6e  terms.  Let  i  be  a  recursive  subset  of  V  containing  all  variables 
occurring  in  Tj  and  T2  such  that  V  -  ^  is  infinite.  Let  R  be  a  constraint  mapping. 

1.  IfV{i,i,R,Ti,T-i)  terminates  with  an  error,  then  there  is  no  R-compatible  semi-unifier  in 
the  i-th  dimension  of  n  and  t^.  If{U',R')  =  V{^,i,  R,Tu  T2)  terminates  without  error, 
then  U'  is  a  ^-maximal  R-compatible  semi-unifier  in  the  i-th  dimension  (via  R' )  of  ti  and 
T2;  that  is, 

(a)  R'iU'in))  =  U'{t2) 

(b)  (Vi  £{!,...,  k}){^x  e  D{Ri))R[{U(z))  =  U{Ri{x)) 

(c)  For  any  R-compatible  semi-unifier  T  in  the  i-th  dimension  of  n   and  T2   there  is  a 
substitution  S  such  that  (Vz  £  ^)S{U'{z))  =  T{x) 

2.  If  U {<^ ,  R,  Ti ,  T2)  terminates  with  an  error,  then  there  is  no  R-compatible  unifier  of  Ti 
and  T2.  If{U',R')  =  U{i,R,Tx,T2)  terminates  without  error,  then  U'  is  a  ^-maximal 
R-compatible  unifier  (via  R' )  of  ti  and  Tj/  that  is, 

(a)U'{n)  =  W{T2) 

(b)  (Vi  £  {1, ... ,  *})(V*  £  D{R,))R[(U{z))  =  U(R.{z)) 

(c)  For  any  R-compatible  unifier  T  of  T\  and  xj  there  is  a  substitution  S  such  that  (Vz  £ 
i)S{U'(x))=T{x) 

The  proof  of  this  lemma  is  by  simultaneous  computational  induction  over  the  definitions 
of  V  and  U.  Its  details  are  truly  tedious,  but  they  are  available  as  a  manuscript  [36].  The 
constraint  mapping  R  passed  as  an  argument  to  V  and  U  encodes  the  inequational  constraints 
encountered  during  the  course  of  the  computation.  Any  further  substitution  has  to  be  compatible 
with  these  constraints  in  the  sense  that  they  must  preserve  these  inequational  constraints.  This 
"preservation"  of  constraints  is  captured  in  the  notion  of  i?-compatible  semi-unifiers  and  unifiers. 
The  lemma  states  that  V  and  U  return  the  most  general  semi-unifiers  and  unifiers,  respectively, 
that  are  compatible  with  the  input  constraints  R.  From  this  lemma  we  obtain  immediately  a 
routine  W  (see  figure  6.3)  that  computes  a  most  general  semi-unifier  for  every  "normal  form" 
SEI  S  with  at  most  one  equation  and  one  inequality  per  inequality  group. 

Theorem  19   Let  S  be  a  system  of  equations  and  inequalities  consisting  of  singleton  sets  only. 
IfW{S)  does  not  terminate  or  terminates  with  an  error  then  S  has  no  solution.  IfU'  =  V^{S) 
terminates  without  error  then  U'  is  a  most  general  semi-unifier  of  S. 

This  specification  has  already  appeared  in  [38].  We  have  also  implemented  it  in  SETL  [108] 
and  tested  it  on  several  examples. 
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W{S)  = 

(Assume  S  -  [tq  =  Tq,  n  <  t{,. 

■,V.<r',)) 

([l.,flO  :=({},({},...,{})); 

ttimes 

{U,,R,):=U(waTs(S,Rt,ro,Ti); 

for  i  =  I  to  k  do 

(U,,,R,.):=V{U,{$),i,R„T,,rl); 

{Ut,Rt):^(Ut.oUt,Rt.y, 

rof; 

return  Ut; 

Figure  6.3:  Pseudo-functional  specification  of  most  general  semi-unifier 

6.2      SEI-Rewriting  Specifications 

In  this  section  we  present  basic,  implementable  rewriting  specifications  for  most  general  semi- 
unifiers.  The  first  is  a  natural  and  straightforward  extension  of  the  rewriting  specification  for 
most  general  unifiers  from  [39],  which  was  expounded  by  Martelli  and  Montanari  and  used  as 
the  starting  point  for  the  development  of  efficient  unification  algorithms  [72].  This  system  is 
in  general,  though,  nonterminating.  The  second  rewriting  specification  refines  the  first  one  by 
adding  an  "extended"  occurs  check.  It  is  coi  jectured  to  be  uniformly  terminating. 

6.2.1      The  Naive  Rewriting  Specification 

The  first  specification,  given  in  Figure  6.4  is  straightforward,  and  similar  versions  can  be 
found  in  the  literature  (e.g.,  [15]).  This  rewriting  system  preserves  semi-unifiers  in  a  sense  that 
we  shall  make  precise  below. 

Definition  13  Let  =>  be  a  reduction  relation  on  systems  of  equations  and  inequalities. 

1.  The  relation  =>  :'*  sound  if  for  every  S,  S'  such  that  S  =>  S'  and  for  every  semi-unifier  a' 
of  S'  there  is  a  semi-unifier  cr  of  S  such  that  cr  |v(S)=  "■'  lv(S)  (and  thus  a  —v(s)  <^')- 

2.  The  relation  =>  is  complete  if  for  every  S,  S'  such  that  S  =>  S'  and  for  every  semi-unifier 
tr  of  S  there  is  a  semi-unifier  a'  of  S'  such  that  (T  |v(5)=  "■'  \v(s)  (and  thus  a  —v{s)  <''')• 

Informally  speaking,  soundness  expresses  that  a  reduction  step  does  not  add  semi-unifiers, 
and  completeness  means  that  no  semi-unifiers  are  lost  in  a  reduction  step. 

Proposition  33  The  reduction  relation  defined  by  the  naive  rewriting  system  (in  Figure  6.4)  is 
sound  and  complete. 

Proof:    Induction  on  the  number  of  rewriting  steps. 

Any  SEI  S  is  in  normal  form  with  respect  to  a  reduction  relation  =>  if  there  is  no  S'  such 
that  S  =>  5'.  If  an  SEI  is  in  normal  form  with  respect  to  the  naive  rewriting  system  or  the 
canonical  rewriting  system  below  it  is  ejisy  to  extreict  a  most  general  semi-unifier  from  it. 
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Given  an  SEI  S  with  k  sets  of  inequalities  we  initially  tag  all  the  inequality  symbols  with 
distinct  "colors"  l,...,k  to  indicate  to  which  group  of  inequalities  they  belong.  This  is 
done  by  superscripts  of  the  inequality  symbol;  e.g.,  <'^'  Then  nondeterministically  choose 
an  equation  or  inequality  and  take  a  rewriting  action  depending  on  its  form." 

1.  f{Mu...,M,)  =  f{Nu...,N^): 

Replace  by  the  equations  Mi  =  Ni,.. . ,  Aft  =  Mk- 

2.  /(A/i, . .  . ,  Ml,)  =  g{Ni , . . . ,  A'';)  where  /  and  g  are  distinct  functors: 
Replace  current  SEI  by  □  (functor  clash). 

3.  f{Mu...,Mk)  =  x: 

Replace  by  z  =  f{Mi, . . . ,  Mm)- 

4.  z  =  /(A/i,  . . . ,  Mk)  where  x  occurs  in  at  least  one  of  M^, . . . ,  Mj,: 
Replace  current  SEI  by  □  (occurs  check). 

5.  X  =  M  where  x  does  not  occur  in  M,  but  occurs  in  another  equation  or  inequality: 
Replace  z  by  A    in  aU  other  equations  or  inequalities. 

6.  z  =  z: 
Delete  it. 

7.  /(A/i,...,A/»)<'''/(^i,..-,iVO: 

Replace  by  inequalities  Mi  <(*'  Ni,...,Mi,  <(')  Mk- 

8.  X  <(')  M  and  z  <(''  N: 

Delete  one  of  the  two  inequalities  and  add  the  equation  M  =  N. 

9.  /(Afi,...,A/i)<'''^: 

Add  the  equation  z  =  /(as'i,  •  • . ,  z^)  where  x'l,  • .  • ,  zj^  are  new  variables  not  occurring 
anywhere  else. 


"Without  loss  of  generality  we  restrict  ourselves  to  the  minimal  nonlinear  alphabet  A  —  {f,{f  i-*  2}). 
Recall  that  O  denotes  the  canonical  unsolvable  SEI. 

Figure  6.4:  Naive  rewriting  specification 
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Proposition  34  Let  S  be  a  system  of  equations  and  inequalities  in  normal  form  uiith  respect  to 
the  reduction  relation  defined  by  the  naive  (canonical)  rewriting  system  in  Figure  6.4- 

If  S  =  {zi  =  Mi,...,Zk  —  Mi,,yi  <  Ni,...,yi  <  Ni}  then  the  substitution  a  —  {xi  h-» 
Ml, . . .  ,xi,  ►—>  Mk}  is  a  most  general  idempotent  semi-unifier  of  S. 

Proof:     By  inspection. 

To  determine  a  most  general  semi-unifier  of  an  SEI  5  we  can  apply  the  naive  rewriting  system 
to  it  and  if  it  terminates  in  a  normal  form  5'  we  can  extract  a  most  general  semi-unifier  of  S'. 
If  S'  =  D  then  5  is  unsolvable;  otherwise  there  is  a  most  general  semi-unifier  a'  of  S'  according 
to  proposition  34.  As  a  result  of  proposition  33  the  restriction  a-'  \v{s)  (or  tr'  itself)  is  a  most 
general  semi-unifier  of  S. 

6.2.2     The  Canonical  Rewriting  Specification 

There  are  systems  of  equations  and  inequalities  for  which  there  is  no  finite  rewriting  derivation  in 
the  naive  rewriting  system;  that  is,  no  sequence  of  rewriting  steps  such  that  after  a  finite  number 
of  steps  no  more  rewritings  are  possible.  Consider,  for  example,  the  system  So  =  {f{x,g(y))  < 
/(y,z)}.  It  is  easy  to  see  that  there  is  always  at  least  one  rule  applicable. 

The  main  reason  for  nontermination  is  that  the  last  inequality  rule,  rule  (9),  introduces  new 
variables  every  time  it  is  executed.  Replacing  it  with  the  deceivingly  pleasing  rule  [97] 

f{M„....Mk)<^: 

Add  the  equation  z  =  /(Mi, . . . ,  M/,). 

indeed  eliminates  the  nontermination  problem  of  rewriting  derivations,  but  also  its  completeness. 
To  see  this,  consider,  for  example,  the  system  Si  =  (/(5(y),5(y))  <  /(z,g(3(y))))-  There  is  a 
derivation  that  would  lead  us  to  claim,  incorrectly,  that  Si  has  no  semi-unifiers. 

If  we  reconsider  system  Sq  it  is  easy  to  see  that  it  is  unsolvable.  This  is  due  to  the  fact  that 
the  inequalities 

ff(y)     <     * 
X     <     y 

are  not  uniformly  or  nonuniformly  solvable.  If  we  denote  the  length  of  a  term  M  by  \M\,  then 
any  solution  Afi  for  x  and  Mj  for  y  would  have  to  satisfy  the  numeric  inequalities  |Mt|  <  IM2I 
and  |Mi|  >  |(7(M2)|  >  IM2I  -I-  1,  which  is  clearly  impossible.  We  can  catch  this  case  by  refining 
rule  (9)  with  an  "extended"  occurs  check.  More  precisely,  let  us  call  the  rewriting  system  with 
rule  (9)  replaced  by  the  rules  in  Figure  6.5  the  canontco/ rewriting  system. 

Proposition  35  TTie  reduction  relation  defined  by  the  rewriting  system  in  Figure  6.4  with  rule 
(9)  replaced  by  the  rules  (9.1)  and  (9.2)  from  Figure  6.5  is  sound  and  complete. 

Proof:     See  discussion  of  system  Sq. 

For  any  reduction  relation  =>  with  a  notion  of  normal  form,  an  effective  (one-step)  normalizing 
strategy  is  a  polynomial-time  computable  function  F  such  that  if  F{S)  =  S  then  S  is  a  normal 
form  and,  otherwise,  if  F{S)  =  S'  then  S  =>•  S';  and  furthermore,  if  S  =^  S',  S'  a  normal  form, 
then  F"(S)  =  F^+^iS)  for  some  n  6  A/". 

Even  though  there  are  still  infinite  rewriting  derivations  possible  in  the  canonical  rewriting 
system  we  conjecture  that  there  is  an  effective  normalizing  strategy  for  the  canonical  rewriting 
system. 
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(9.1)  f{Mi,...,Mk)  <''"'  X  and  there  are  variables  xo,...,Xn  such  that  x  =  xq,  Xi  <<•'•' 
Zi+i  are  inequalities  in  the  current  SEI  for  0  <  i  <  n  -  1  and  some  colors  I'l,.. .,  in-i, 
and  there  exists  an  i  such  that  i„  occurs  in  M^: 

Replace  current  SEI  by  O  (extended  occurs  check). 

(9.2)  f(Mi,...,Mk)  <''"'  2  and  there  is  no  sequence  of  variables  xo,...,Xn  such  that 
X  =  xo,  Xi  <(•''•'  Xi  +  i  are  inequalities  in  the  current  SEI  for  0  <  »  <  n  -  1  and  some 
colors  ii,  • . .  lin-i,  and  z„  occurs  in  some  M^: 

Add  the  equation  x  =  f{x[,. . . ,  zj,)  where  x[,...,  x\  are  new  variables  not  occurring 
anywhere  else.  ^^^ 


Figure  6.5:  Extended  occurs  check 

Conjecture  1  T/iere  tzisls  an  effective  normalizing  strategy  for  the  canonical  rewriting  system 
such  that  the  strategy  admits  only  finite  rewriting  derivations. 

In  fact  we  believe  that  any  strategy  that  executes  rule  (9.2)  only  if  there  are  no  other  rules 
applicable  satisfies  this  conjecture  (see  chapter  7). 

An  immediate  consequence  of  this  conjecture  is  the  decidability  of  semi-unification. 

Conjecture  2    The  set  of  all  solvable  systems  of  equations  and  inequalities  is  decidable. 

6.3      Graph  Rewriting  Specification 

It  is  probably  easier  to  analyze  the  extended  occurs  check  in  both  the  functional  specification  and 
the  SEI-rewriting  specification  in  a  graph-theoretic  setting  since  it  is  a  (syntactically)  nonlocal 
condition.  This  formulation  is  a  good  starting  point  for  both  the  analysis  of  termination  proper- 
ties, for  practical  implementations,  and  for  optimizations  for  subcases  of  general  semi-unification, 
such  as  uniform  semi-unification. 

6.3.1      Arrow  graphs 

Recall  that  term  graphs  are  (nonunique)  representations  for  sets  of  terms.  Arrow  graphs  are 
term  graphs  with  additional  structure  to  represent  SEI's. 

Definition  14  An  arrow  graph  (with  fe-colored  arrows)  G  is  a  sextuple  {N,  Np,  E,  L,A,~) 
where  \G\  =  {N,Nf,E,L)  is  a  term  graph  (over  At),  A=  {Ai,...,Ak)  is  ak-tuple,  At  C  NxN, 
for  I  <i  <  k;  the  elements  of  Ai  are  called  arrows;  and  ~  t*  an  equivalence  relation  on  N. 

We  can  think  of  an  arrow  in  A  as  colored  by  1, . . . ,  fe  indicating  to  which  Ai  it  belongs.  We 
may  write  m  —>  n  for  (m,  n)  £  Ai  whenever  A  and  Ai  are  understood  from  the  context. 

An  arrow  graph  representation  of  SEI  S  =  (Mo  =  No,  Mi  <  Ni, . . . ,  Mi,  ■  ■  ■  Nk)  is  a  an  arrow 
graph  G  whose  underlying  term  graph,  \G\,  represents  all  the  terms  occurring  in  S;  G  contains 
arrows  mj  -^  n^  if  [m^]  =  Mi,  [n,]  =  .A^j  for  1  <  »  <  k;  and  ~  in  C?  is  the  smallest  equivalence 
relation  containing  mo  ~  no  if  [mo]  =  Mq,  [no]  =  No-  In  other  words,  the  colored  arrows  encode 
inequalities,  and  the  equivalence  relation  encodes  equations. 

Let  A  =  A2  be  the  usual  ranked  alphabet. 
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Definition  15  An  interpretation  /  of  an  arrow  graph  G  =  (N,  Np,  L,  E,  A,~)  it  a  mapping  of 
nodes  to  first-order  terms  (with  variables)  over  the  ranked  alphabet  A.  I  is  valid  if  there  exist 
quotient  substitutions  Ri, . . . ,  Ri,  such  that 

1.  (Vn  e  iVf ,  ni,  n,  G  N)  L(n)  =  /,  E{n)  =  {n,,  n,)  =>  I{n)  =  f{I{n,),  /(nj)); 

2.  (Vti,  n'  e  N)n-^n'  =>  I{n)  =  I{n'); 

3.  (Vn,  n'  e  ^•,  1  <  »  <  jfe)  71  -U  n'  =>  R^(I{n))  =  I{n'). 

It  is  easy  to  see  that  an  SEI  5  has  a  semi-unifier  if  and  only  if  G,  an  arrow  graph  representation 
of  5,  has  a  valid  interpretation. 

6.3.2     Algorithm  A 

Algorithm  A  in  Figure  6.6  applies  the  closure  rules  depicted  in  Figure  6.7,  repeatedly  rewriting 
the  initial  arrow  graph  representation  of  an  SEI  S  until  the  arrow  graph  does  not  change  any 
more. 

An.  equivalence  relation  on  the  nodes  of  a  term  graph  can  be  interpreted  as  a  substitution 
relative  to  a  system  of  (equivalence  class)  representatives  as  long  as  the  equivalence  relation  is  a 
structural  equivalence,  i.e.,  satisfies  closure  rule  1  eind  does  not  trigger  rule  4a.  This  correspon- 
dence has  been  widely  used  in  graph-theoretic  formulations  of  unification  algorithms  (c.  f.  [89]), 
and  we  wiU  refrain  from  making  it  precise  here. 

Proposition  36  Let  G  be  an  arrow  graph  representation  of  SEI  S .  If  algorithm  A  (Figure 
6.6)  terminates  on  input  G  with  an  arrow  graph  G'  ^  O,  then  the  resulting  equivalence  relation 
represents  a  most  general  semi-unifier  of  S .  If  G'  =  O  then  S  is  unsolvable. 

Proof: 

Every  graph  rewriting  step  corresponds  to  SEI  rewriting  steps  in  the  canonical  rewrit- 
ing system  in  Figure  6.4  with  the  extended  occurs  check  rules  from  Figure  6.5  re- 
placing rule  (9)  on  the  terms  represented  by  the  term  graph  \G\,  and  vice  versa.  By 
proposition  35  the  canonical  SEI-rewriting  system  computes  a  most  general  semi- 
unifier. 

In  contrast  to  Mycroft's  original  type  inference  algorithm  for  the  Mycroft  Calculus  there  are 
no  known  inputs  that  lead  to  nontermination  of  algorithm  A.  Nonetheless  it  is  currently  un- 
known whether  algorithm  A  terminates  uniformly  or  whether  there  is  any  uniformly  terminating 
algorithm  for  semi-unifiability  at  all.  Since  we  conjecture  that  both  questions  have  an  affirmative 
answer,  we  believe  that  key  to  the  establishment  of  this  result  is  an  in-depth  investigation  of 
the  deep  structure  of  sequences  of  arrow  graphs  that  arise  from  the  (nondeterministic)  execution 
of  algorithm  A.  For  this  reason  we  call  a  sequence  Q  =  (Gj, . . . ,  G,-, . . . , )  of  arrow  graphs  an 
execution  (of  algorithm  A)  if  every  component  in  the  sequence  is  derived  from  its  predecessor 
by  one  of  the  rewriting  rules  in  Figure  6.6,  respectively  Figure  6.7.  Some  elementary  approaches 
and  preliminary  results  are  reported  in  chapter  7. 

6.4     Arithmetic  Compression  for  Uniform 

Semi-Unification 

For  uniform  semi-unification  we  will  show  that  it  is  possible  to  establish  decidability.  In  fact 
algorithm  A  (Figure  6.6)  terminates  uniformly  for  every  input  in  exponential  time  and  space  if 
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Let  G=iN,NF,E,L,  A,  ~).  Apply  the  foUowing  rules  (depicted  also  in  Figure 
6.7)  until  convergence: 

1.  If  there  exist  nodes  m  and  n  labeled  with  a  functor  /  and  with  chil- 
dren mi.mj  and  th.tiz,  respectively,  such  that  m -^  n  then  merge  the 
equivalence  classes  of  mi  and  ni  and  of  m2  and  712- 

2.  If  there  exist  nodes  m  and  n  labeled  with  a  functor  /  and  with  children 
mi.mj  and  ni.nj,  respectively,  such  that  m  -^  n  then  place  arrows 
mi  — ►  Ui  and  mj  — »  n2. 

3.  If  there  exist  nodes  mi,  m2,  ni,  and  712  such  that 

(a)  mi    ~  ni,  mi   -^   m,   and  rii    -^   712   then  merge  the  equivalence 
classes  of  m2  and  nj; 

(b)  mi  ~  ni,  mi  -^  mj  and  mj  ~  712  then  place  an  arrow  7ii  -»  712. 

4.  (a)   (Extended  occurs  check)  If  there  is  an  path  consisting  of  arrows  of 

any  color  (arrow  path)  from  tii  to  tij  and  712  is  a  proper  descendant 
of  7ii,  then  reduce  to  the  improper  arrow  graph  O. 
(b)  If  the  extended  occurs  check  is  not  applicable  and  there  exist  nodes 
m  and  n  such  that  m  is  labeled  with  functor  /  and  has  children 
mi,m2,  71  is  not  equivalent  to  a  functor  labeled  node,  and  there  is 
an  arrow  m -^  n  then  create  new  nodes  7i',7ii,7i2  (each  initially  in 
their  own  equivalence  class)  and  label  n'  with  functor  /,  label  7ii 
and  71*2  with  new  variables  x'  and  x" ,  respectively;  make  n[,n'.2  the 
children  of  71';  and  merge  the  equivalence  classes  of  7t  and  7t'. 


Figure  6.6:  Algorithm  A 
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Figure  6.7:  Closure  rules 
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the  initial  arrow  graph  is  only  1-colored  (see  below).  By  a  form  of  "arithmetic"  compression  it 
is  possible  to  compute  most  general  uniform  semi-unifiers  in  polynomial  space,  as  shown  in  this 
chapter.  The  decidability  of  uniform  semi-unification  has  also  been  discovered  by  Pudlak  [96].  If 
it  is  only  desired  to  decide  uniform  semi-unifiability  it  is  possible  to  simplify  the  algorithm  and 
run  it  in  polynomial  time  by  a  result  of  Kapur  et  al.  [54]. 

6.4.1      An  exponential  time  algorithm  for  uniform  semi-unification 

It  can  be  shown  that  algorithm  A  terminates  in  exponential  time  for  uniform  semi-unification 
under  a  deterministic  rewriting  strategy  we  shall  describe  below.  It  is  inspired  by  normalized 
executions,  in  which  rule  4  of  algorithm  A  is  only  executed  when  none  of  the  other  rules  are 
applicable.  What  permits  a  relatively  simple  termination  proof  (and  the  exponential  upper 
bound)  is  that,  for  arrow  graphs  of  one  color,  for  every  node  without  an  outarrow  there  can  be 
at  most  one  "new"  node  created  by  execution  of  rule  4.  This  property  does  not  hold  for  arrow 
graphs  with  two  or  more  colors. 

Proposition  37  The  algorithm  "solve"  in  Figure  6.8  is  an  exponential-lime  uniform  semi- 
unification  algorithm. 

Proof: 

Let  us  say  a  discrepancy  in  an  arrow  graph  is  a  node  where  rule  4  can  be  applied; 
i.e.,  it  is  a  functor  node  n  with  an  outarrow  to  variable  node  n'  that  is  not  equivalent 
to  any  functor  node.  We  associate  with  every  arrow  graph  G  of  one  color  the  triple 
(e„/„,u),  e),  called  the  characteristic  o{  G,  where 

1.  e^/o  is  the  number  of  equivalence  classes  in  G  that  hcis  no  node  with  an  outarrow 
(i.e.,  for  no  node  n  in  the  equivalence  class  is  there  an  arrow  n  —*  n'  for  any  n'); 

2.  w  is  the  number  of  equivalence  classes  that  contain  only  variable  nodes  at  least 
one  of  which  is  reachable  from  a  discrepancy  via  an  arrow  path  (discrepancy 
weight); 

3.  e  is  the  number  of  equivalence  classes. 

These  triples  are  lexicographically  well-ordered. 

The  procedure  "solve"  in  Figure  6.8  implements  a  specific  strategy  for  applying  the 
closure  rules  of  algorithm  A.  In  particular,  rules  1  and  3,  which  merge  equivalence 
classes,  are  always  applied  exhaustively  after  any  of  the  other  steps  as  a  "normaliza- 
tion" step.  Furthermore,  when  rule  4  is  applicable  at  some  discrepancy  n  then  it  is 
clear  that  it  can  be  applied  recursively  at  every  descendant  of  n  after  execution  of 
rule  2  at  node  n,  until  the  variable  leaves  of  n  are  reached;  this  is  accomplished  by 
the  procedure  "copy".  Since  every  new  node  created  by  copy(n)  is  not  a  descendant 
of  n,  it  is  easy  to  see  that  an  invocation  of  copy(n)  creates  ife  new  nodes,  if  n  has  k 
descendants  (descendant  equivalence  classes)  with  no  outarrow  at  the  time  copy(7i) 
is  called. 

Let  us  call  exhaustive  application  of  rules  1  and  3  a  normalization  step.  We  call 
merging  an  equivalence  class  with  an  outarrow  and  an  equivalence  class  without  an 
outarrow  a  skewed  merge. 

Now  note  that  the  exhaustive  application  of  rules  1  and  3,  if  applicable  at  least  once, 
decreases  the  number  of  equivalence  classes  at  least  by  one.  Furthermore,  if  normal- 
ization contains  a  skewed  merge,  then  the  discrepancy  weight,  w,  may  be  increased, 
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but  the  number  of  equivalence  classes  without  an  outarrow,  e„^„,  is  decreased  by 
at  least  one.  If  normalization  contains  no  skewed  merge,  the  number  of  equivalence 
classes  reachable  from  any  discrepancy  does  not  increase  and,  consequently,  the  dis- 
crepancy weight  does  not  increase,  either.  In  all  cases  the  total  number  of  equivalence 
classes  does  not  increase. 

Application  of  rule  2  with  subsequent  normalization  leads  to  consideration  of  two 
cases:  either  the  tail  of  the  new  arrow  propagated  has  already  an  outarrow,  or  it 
does  not.  In  the  first  case,  clearly  e„/o  is  decreased  by  one.  In  the  second  case 
e  is  properly  decreased,  and  there  are  two  possibilities  to  consider  depending  on 
whether  the  normalization  contsiins  a  skewed  merge.  If  a  skewed  merge  occurs,  e„;o 
is  properly  decreased.  If  no  skewed  merge  occurs,  it  can  be  seen  that  the  discrepancy 
weight  is  not  increased. 

Finally,  a  discrepancy  n  is  minimal  i{  there  is  no  discrepancy  n',  a  sequence  of  nodes 
Tii,...,nn  such  that  n.'  =  ni,7i  =  n*  and  rij  — »  n^+i  or  Ui  is  a  child  of  tij+i  for 
1  <  i  <  k  —  1  with  the  additional  constraint  that  there  is  a  j  such  that  Uj  is  a 
child  of  Uj+i  (see  chapter  7).  If  there  is  a  discrepancy,  but  no  minimal  one,  then 
reduction  to  Q  is  performed,  since  this  corresponds  to  a  "preemptive"  application  of 
the  extended  occurs  check.  If  a  minimal  discrepancy  exists,  then  rule  4b  is  applicable 
at  a  minimal  discrepancy  (there  may  be  several  minimal  discrepancies).  Instead  of 
applying  rule  4b  only  once  algorithm  "solve"  applies  routine  "copy",  which  is  an 
exhaustive  application  of  rule  4b  to  the  original  discrepancy  and,  recursively,  all  its 
descendants,  facilitated  by  intermediate  applications  of  rule  2  and  3.  Application 
of  "copy"  to  the  children  of  a  minimal  discrepancy  terminates  in  time  0{e^/g)  and 
decreases  the  discrepancy  weight  by  one.  Although  e  is  properly  increased,  £„/„  is 
not.^ 

This  shows  that  every  iteration  through  the  loop  strictly  decreases  the  characteristic 
of  the  rewritten  arrow  graph.  Consequently  the  procedure  solve  terminates  uniformly. 
Furthermore,  since  w  is  bounded  by  e,  and  e  is  only  increased  by  execution  of  "copy", 
it  can  be  seen  that  e  at  most  doubles  every  time  e„/o  decreases  by  one.  Clearly  every 
iteration  of  the  loop  in  "solve"  is  executed  in  polynomial  time  with  respect  to  the  size 
(number  of  nodes)  of  the  arrow  graph  before  the  iteration.  This  shows  that  solve(G) 
terminates  in  exponential  time;  i.e.,  in  time  0{2"^  ),  for  some  c,  JIt,  where  n  is  the 
number  of  nodes  in  G. 


6.4.2     Interaction  Graphs 

Notice  that  a  l-colored  arrow  graph  G  will  be  transformed  by  algorithm  A  into  a  possibly 
exponentially  bigger  arrow  graph  with  many  "new"  nodes  (introduced  by  rule  4  in  Figure  6.6). 
Let  us  call  the  nodes  that  are  in  the  input  graph  G  "original"  nodes  and  all  other  nodes  that  are 
added"  by  A  "new"  nodes.  If  we  consider  all  arrow  paths  in  the  "evolved"  graph  after  a  number  of 
graph  rewriting  steps  it  can  be  seen  from  the  closure  rules  that  almost  all  the  relevant  information 
about  arrow  paths  in  an  execution  of  algorithm  A  can  be  computed  from  other  information  about 
arrow  paths.  For  the  most  part,  it  is  sufficient  to  consider  only  arrow  paths  from  original  nodes 
to  original  nodes.  Since  there  may  be  arrow  graphs  from  two  origineil  nodes  to  a  common  new 
node  we  have  to  consider,  more  generally,  all  the  possible  ways  in  which  arrow  graphs  from  two 


The  fact  that  e^/o  i»  not  increased  by  application  of  "copy"  is  at  heart  of  why  this  termination  proof  works 
for  uniform  »emi-unifieation,  but  not  for  general  semi-uniAcatioxu 
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Recall  the  closure  rules  of  algorithm  A,  Figure  6.6,  also  depicted  in  Figure  6.7. 

solve(G)  = 
repeat 

apply  rules  1  and  3  exhaustively; 
if  rule  2  is  applicable  then 

apply  it  (once); 
else  if  rule  4  is  applicable  then 

if  there  is  a  minimal  discrepancy  n  then 
(L(n)  =  /,£(n)  =  (ni,ri2)) 
create  a  new  functor  node  n',  L{n')  —  /, 
with  children  copy(ni)  and  copy(n2); 
place  an  arrow  from  n  to  n.  ; 
else 

reduce  to  n  (extended  occurs  check); 
end  if 
end  if 
until  no  more  rules  are  applicable; 


copy(7i)  = 

if  n  has  an  outarrow  to  some  node  u'  then 

return  n'; 
else  if  n  is  equivalent  to  a  functor  node  n', 
i(n')  =  j,E{Ti!)  =  (n'l.n'j)  then 

create  new  functor  node  n" ,  L{n")  =  /, 

with  children  copy(nj)  and  copy(n2); 

return  n"; 
else 

create  new  variable  node  n',  L{n')  =  x',  where  x'  is  a  new  variable; 

return  n'; 
end  if 

Figure  6.8:  Exponential-time  uniform  semi-unification  algorithm 
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Let  G  =  (N,  Np,  L,  E,  C)  be  an  interaction  graph.  G  is  normal  if  it  satisfies  the  following 

closure  rules. 

1.  For  n,  Tii.nj,  n',  n'l,  n'j    £    N  such  that   E{n)    =    (711,112),  E{m)   =    (mi.mj),   if  (/,/')    £ 
C{n,n')  then  (/,/')  £  C{ni,n[)  and  (/,/')  £  C(n2,n'j). 

2.  For  ni.nj,  713,714  £  N,  if  (/oi,/io)  6  C(no,  tii),  (/23>'3j)  £  ^7(712, 713),  (/o2,/2o)  S  C(7io,  712) 
then  (/lo  +  (/02  -  '01),  ^32  +  ('20  -  '23))  e  C'(ni,  713) 

if  the  differences  above  are  nonnegative. 

3.  If  (/,  /')  £  C{n,  n')  then  (/',  I)  £  C{n',  n). 

4.  (0,0)  £  C(7i,  7i). 

5.  If  (/,  /')  £  C{n,  n')  then  {I  +  1,  /'  +  1)  £  C{n,  n'). 

Figure  6.9:  Consistency  rules  for  uniform  semi-unification 

original  nodes  "merge"  together,  if  at  all.  Consequently  it  is  not  necessary  to  explicitly  construct 
new  nodes,  only  all  relevant  information  about  arrow  paths  from  pairs  of  original  nodes.  Since 
we  assume  only  1-colored  graphs  the  arrow  paths  in  question  are  completely  characterizable  by 
their  starting  point,  end  point  and  their  length.  Since  the  length  can  be  stored  in  space  0(log7i) 
where  71  is  the  length  itself,  this  representation  of  arrow  paths  yields  a  space  compression  due  to 
this  "arithmetization"  of  arrow  paths.  Indeed  we  can  thus  devise  an  algorithm  that  computes  a 
most  general  uniform  semi-unifier  in  polynomial  space.  The  details  are  below. 

Definition  16   (Interaction  graph) 

An  interaction  graph  (of  degree  1)  is  a  term  graph  over  A2  with  an  additional  consistency 
mapping  C  :   N  x  N   —>   2  .     A  normal   interaction  graph  w  an  interaction  graph  whose 

consistency  sets  satisfy  the  rules  in  Figure  6.9. 

An  interaction  graph  representation  of  an  SEI  5  is  very  similar  to  an  arrow  graph  representa- 
tion (for  uniform  semi-unification  problems).  In  particular,  both  inequalities  and  equations  can 
be  encoded  in  a  single  consistency  mapping.  Specifically,  an  interaction  graph  representation  of 
5  =  (Mo  =  No,  Ml  <  Ni)  is  an  interaction  graph  G  whose  underlying  term  graph,  \G\,  repre- 
sents all  the  terms  occurring  in  5;  and  the  consistency  mapping  in  G  is  the  smallest  C  such  that 
(0, 0)  £  C(7no,  Tio),  (1,0)  £  C{mi,ni)  if  [771,]  =  Mi,  [ui]  =  Ni,0<i<l. 

Let  A  =  A3  he  the  usual  ranked  alphabet. 

Definition  17  An  interpretation  /  of  an  interaction  graph  G  =  {N,Nf,L,E,C)  is  a  mapping 
of  nodes  to  first-order  terms  (with  variables)  over  the  ranked  alphabet  A.  I  is  valid  if  there  is  a 
quotient  substitution  R  such  that 

1.  (Vti  £  Nf,  ni,  712  e  N)  L{n)  =  f,  E(n)  =  (711,712)  =>  /(71)  =  /(/(711),  /(712)); 

2.  (Vti,  71'  £  N,l,l'  £  A^)  (/,/')  £  C{n,n')  =>  R'{I{n))  =  R''{I{n')). 

It  is  clear  that  for  every  interaction  graph  G  with  consistency  mapping  C  there  is  a  unique 
smallest  normal  interaction  graph  G  with  the  same  term  graph  as  G  and  a  consistency  mapping 
C  that  contains  C.  It  is  also  easy  to  check  that  7  is  a  valid  interpretation  for  G  if  and  only  if  I 
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is  a  valid  interpretation  for  G,  and  SEI  S  has  a  uniform  semi-unifier  if  and  only  if  an  interaction 
graph  representation  G  of  5  has  a  valid  interpretation. 

Now,  a  somewhat  more  complicated  analog  of  the  extended  occurs  check  of  algorithm  A, 
applied  to  a  normal  interaction  graph  <5,  determines  whether  there  is  a  valid  interpretation  for 
G  and,  consequently,  whether  the  SEI  that  G  represents  has  a  uniform  semi-unifier. 

Forn'  e  NF,n,n\,n'i  G  N,E{n')  =  (n[,n'^)  we  say  {n[, I')  (respectively  (n'j, /')  is  a  iirect /e/t 
(right)  descendant  of  {n,  I)  with  respect  to  C  if  (/,  I')  G  C{n,  n').  The  transitive  closure  of  this 
relation  defines  proper  deacendancy,  and  the  reflexive-transitive  closure  defines  descendancy. 

Theorem  20  Let  G  be  an  interaction  graph  representing  an  SEI  S  over  A2,  and  let  G  be  the 
smallest  normal  interaction  graph  containing  G,  where  the  consistency  mapping  in  G  is  C.  Then 
S  is  uniformly  semi-unifiable  if  and  only  if  for  no  n  £  N  and  l,d  E  A/",  (n,l  +  d)  is  a  proper 
descendant  of  (n,  /)  in  C. 

Proof: 

First  we  shall  prove  that  if  there  is  n  E  N  and  /  G  A/"  such  that  (n,  /)  is  a  proper 
descendant  of  (n,  0)  in  C,  then  G  has  no  valid  interpretation.  Assume  7  is  a  valid 
interpretation  with  quotient  substitution  R.  If  [n' ,  I')  is  a  direct  descendant  of  (n,  /) 
in  C,  then  \R^' (I{n'))\  <  \R'{I{n))\,  and,  by  induction,  this  holds  also  if  (n',/')  is  a 
proper  descendant  of  (n, /)  in  C.  If  (71, /-I- d)  is  a  proper  descendant  of  (n,  Z)  this  means 
that  \R''(R'{I{n)))\  <  \R'(I{n))\;  but  this  is  manifestly  impossible  since  applying  a 
substitution  to  a  term  cannot  make  the  (tree)  size  of  a  term  smaller.  Consequently 
there  cannot  be  a  valid  interpretation  for  G. 

Conversely,  if  there  is  no  (n,  /  -I-  d)  that  is  a  proper  descendant  of  (n,  /),  then  we  can 
define  a  valid  interpretation  for  G.  Assume  all  variables  in  V  are  totally  ordered  in 
some  fashion.  We  may  assume  that  the  underlying  term  graph  of  G  has  exactly  one 
node  labeled  x  for  every  variable  x  occurring  in  5.  Thus  the  ordering  on  variables 
extends  uniquely  to  nodes.  We  can  also  extend  it  lexicographically  to  node-number 
pairs,  where  the  ordering  on  numbers  is  the  standard  arithmetic  ordering. 

I(n,  1)  = 

if  {n,l)  has  no  proper  descendant  then 
(n  is  variable  labeled) 
let  {n',l')  be  the  least  (n",l")  such  that 

{n",n<{n,l)- 
return  y^'  '  (where  L[n)  —  y) 
else 

let  {m',l'),{m",l")  be  direct  left, 

respectively  right,  descendants  of  (n,/); 
return /(7(m',/'),/(T7i",/")) 
end  if 

Now,  it  is  routine  to  check  that  /(n)  =  /(n,  0)  defines  a  well-defined  valid  inter- 
pretation since  the  closure  properties  of  the  consistency  rules  guarantee  that  the 
definition  is  well-defined,  and  the  recursion  must  terminate  if  there  is  no  {n,l)  that 
has  (n,  /  -I-  (i),  d  >  0,  as  a  proper  descendeint. 
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Let  C  C  A/"  X  A/*.  C  is  consistently  closed  if  the  following  closure  rules  are  satisfied. 

1.  If  (/i,/2),(/3,/2),('3,/4)  e  C  then  (/i,/4)  e  c. 

2.  If  (/,/')  £  C  then  {I  +  d,l'  +  d)  e  C,d>0,. 

Figure  6.10:  Consistently  closed  relations 
6.4.3     A  polynomial  space  algorithm  for  uniform  semi-unification 

The  consistency  mapping  in  an  interaction  graph  maps  pairs  of  nodes  to  infinite  sets.  In  order 
to  transform  the  closure  rules  for  interaction  graphs  into  an  algorithm  it  is  necessary  to  find  a 
finite  representation  and  effective  means  of  manipulating  it.  We  can  consider  a  given  set  C{n,  n') 
and  "close"  it  with  respect  to  the  consistency  rules  in  the  "trivial"  term  graph  consisting  only 
of  nodes  n  and  n'  (and  no  edges  or  other  nodes).  In  this  sense  every  set  of  pairs  of  nonnegative 
numbers  generates,  independent  of  any  term  graph,  a  unique  smallest  set  of  pairs  of  numbers 
that  are  closed  with  respect  to  the  consistency  rules.  We  shall  show  that  every  such  closed  set 
can  be  represented  by  at  most  two  pairs  of  numbers,  and  the  consistency  rules  that  involve  the 
structure  of  a  given  term  graph,  namely  rule  1  and  rule  2  can  be  encoded  by  effective  operations 
on  such  pairs  of  numbers.  The  details  are  below. 

A  binary  relation  C  on  the  natural  numbers  is  consistently  closedif  the  closure  rules  in  Figure 
6.10  are  satisfied. 

For  consistently  closed  relations  we  have  the  following  proposition. 

Proposition  38  Let  C  be  a  consistently  closed  relation,  and  let  i,l2,d,d'  £  }/,d'  >  d  >  0. 
Then 

1.  {luh),  {h  +d,h)GC=>  {h,h  +  d)eC  and  {h,h),  {h,l2+d)eC=>  [h  +  d,^)  S  C; 

2.  (/i,  /j),  Ci ,  '2  +  d),  {h,l2  +d')£C^  {h,l2  +  {d'  -  d))  e  C; 

3.  {h,l2),{hj2  +  d),{lul2  +  d')  £  C  ^  {h,l2  +  scd{d,d'))  e  C. 
Proof: 

1.  If  (/i,i2),('i  +d,l2)  e  C,  then  (h  +d,l2+d)  EC  by  rule  2  of  the  definition  of 
consistently  closed  relations  (Figure  6.10),  and,  by  rule  1,  (/j,  Ij  +  d).  The  other 
case  is  symmetric. 

2.  U{h,l2),{h,l2+d),{lul2+d')eC,ihen{h+{d'-d),l2+d  +  {d'-d))  =  {h  + 
{d' - d),  I2  +  d')eC  by  tnlt  2.  Since  (/i +  (d'-d),Z2  +  <i'),  (/i, /j  +  d'),  (/i, /j)  6  C 
we  have  {h  +  {d'  —  d),  /j)  £  C"  by  rule  1.  The  result  follows  by  case  1  above. 

3.  Note  that,  by  induction  on  Euler's  gcd-algorithm,  if  for  any  property  P{d) 
over  the  natural  numbers  we  have  (Vd,  d'  6  ^/',d'  >  d)  P{d)  and  P{d')  => 
P{d'  -  d)  then  it  also  holds  that  P(d)and  P{d')  =>  P{gcd{d,d')).  If  we  let 
P{d)  =  {li,l2  +  d)  £  C,  then  the  result  follows  from  case  2. 

The  significance  of  consistently  closed  relations  is  summarized  in  the  following  proposition. 

Proposition  39  Let  G  be  a  normal  interaction  graph  with  consistency  mapping  C.  Then 
C{n,n')  is  consistently  closed  for  all  nodes  n,n'. 
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that 


Proof: 

Closure  property  2  is  established  by  simple  induction  on  d  and  rule  5  in  the  consis- 
tency rules  for  normal  interaction  graphs  (Figure  6.9). 

Property  1  is  a  special  case  of  rule  2  in  Figure  6.9.  Consider  C{n,n').  If 
(^,/2),(/3,/2),(/3,/4)  e  C{n,n'),  then  [li.h)  £  C{n',n)  by  rule  3  of  Figure  ^.9. 
With  no  =  n',ni  =  n,n2  =  n,n3  =  n'Jio  =  hjoi  =  'oj  =  '2, '20  =  '23  =  '3, '32  =  '4 
it  follows  by  rule  2  that  (/io,'32)  =  ('1,^4)  €  C{n,n'). 

Drawing  on  terminology  from  algebra,  we  shall  say  a  relation  B  spans  a  consistently  closed 
C  if  the  smallest  consistently  closed  relation  containing  B  is  C;  we  shall  denote  this  by  (S)  =  C. 
If  no  set  with  cardinality  smaUer  than  B  spans  C,  then  S  is  a  basis  of  C  A  set  B  is  independent 
if  no  proper  subset  of  B  is  a  basis. 

The  foUowing  theorem  is  at  the  heart  of  our  uniform  semi-unification  algorithm. 

Theorem  21         1.   Every  consistently  closed  relation  C  has  a  basis  of  cardinality  at  most  2; 
i.e.,  there  ezwt/i, /j, /'i, /j  £  V  such  that  ((/i, /2),  (/'i, 'i))  =  C.^ 

2.  For  every  consistently  closed  relation  C  there  exist  unique  l,l',c  6  -V,  fc  >  -c  such 
C  =  {{l,l'),il  +  k,l'  +  k+c)). 

Proof: 

Part  1  follows  immediately  from  part  2,  of  course. 

Let  C  be  consistently  closed  relation.  If  C  is  empty,  then  the  empty  set  is  a  basis  of  C, 
and  we  are  done.  Otherwise,  let  /'  be  the  smallest  number  with  (/",  /')  G  C  for  some  /", 
and  let  /  be  the  smallest  /  such  that  (/,  /')  6  C.  If  {{I,  I'))  =  C,  we  are  done.  Otherwise, 
let  c  be  the  smallest  positive  number  sjch  that  (/  +  k",l'  +  k"  +  c)  £  C  for  some 
(possibly  negative)  integer  k" .  Let  k  be  the  smallest  k  such  that  {l  +  k,l'  +  k+c)  G  C. 
Note  that  fe  >  -c  by  definition  of  /'  and  /.  Clearly,  ((/,  l'),{l  +  k,l  +  k  +  c))  C  C. 
We  shall  now  show  that  {(l,l'),{l  +  k,l  +  k  +  c))  D  C.  Let  {h,^)  be  any  pair  in  C. 
There  exist  unique  integers  k',c>  such  that  (/i,/2)  =  ('  +  <«'.''  +  ^'  +  c').  There  are 
three  cases  to  consider:  c'  =  0,c'  >  0,  and  c'  <  0. 

c'  =  0:  If  c'  =  0,  then  it  must  be  that  k'  >  0.  But  then  {h ,  ^2)  G  ((/,  I'))  (by  rule  2  of 
consistently  closed  relations)  and  (/i,/2)  C  ({l,l'),{l  +  k,l'  +  k  +c)). 

c'  >  0:  By  construction  of/'  and  /  it  must  be  that  k'  >  -d .  Consequently  [k -\- c)  + 
[k'  +  c')>0,c+  {k'  +  c')  >  0,c'  +  (k+c)>  0.  By  rule  2  for  consistently  closed 
relations  we  can  conclude  that  (/-h  (it  -he)  -i-  [k'+c'),  l'  +  {k  +  c)  +  {k'+c')),  {l  +  k  + 
c+{k'  +  c'),l'  +  k+c  +  {k'  +  c')  +  c),{l  +  k'  +  c'  +  {k  +  c),l'  +  k'  +  c'  +  {k  +  c)  +  c')eC. 
From  the  previous  proposition  we  have  {I  +  [k  +  c)  +  [k'  +  c'),  gcd(c,  c'))  G  C. 
By  definition  of  c  this  implies  that  c  <  gcd(c,  c')  and  thus  c'  =  ic  for  some 
i  g  ^.  With  the  looping  rule  we  can  show  that  [l  +  k' ,1'  +  k'  +  c'  +  c)  e  C  and, 
consequently,  (/  -I-  k' ,  I'  +  k'  +  c)  e  C.  This  shows  that  k'  >  k  hy  definition  of  k. 
And  furthermore,  since  {I  +  k',l'  +  k'  +  c')  =  [l  +  k  +  d,l'  +  k  +  d+  ic)  for  some 
d,ie/^  this  shows  that  (/  +  k',  V  +  k'  +  c')  G  ((/,  V),  [l  +  kj'  +  k+  c)). 

c'  <  0:  This  case  can  be  reduced  to  the  case  c'  >  0,  since,  by  the  previous  proposition, 
if  jfe  >  0  then  {{I,  /'),  {l  +  k+c,l'  +  k)}  is  also  in  ((/,  /'),  (/ -^  ile,  /'  -I-  fe  -t-  c));  if  it  <  0 
then  (/  +  fe,  /'  -I-  c  -I-  fc),  (/  -I-  ife  -I-  (-(c  -f  it))  -I-  c,  /'  -|-  c  -|-  it  -|-  (-(c  -|-  k)))  is  another 
way  of  writing  (I  +  k,l'  +  c  +  k),  (1,1')  that  is  of  the  symmetric  form  with  the 
"looping  factor"  c  in  the  first  component. 


^As  is  conventional,  wc  shall  elide  the  set  former  braclcets  in  finite  bases. 
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We  can  use  this  finite  representation  to  construct  a  polynomial-space  algorithm  for  computing 
most  general  uniform  semi-unifiers  as  follows.  Let  G  be  an  interaction  graph  representation  of 
SEI  S  with  consistency  mapping  C.  Add  pairs  (0,0)  into  C[n,n)  for  every  node  n.  Maintain 
at  most  two  number  pairs  per  node  pair.  If  the  set  of  node  pairs  B  is  associated  with  (n,  n'), 
whose  left  children  are  m  and  m! ,  respectively,  then  take  the  number  pairs  B'  associated  with 
{Tn,m')  and  compute  a  new  basis  B"  of  {B  U  B')  and  associate  it  with  {m,m.'),  replacing  B' . 
This  corresponds  to  "executing"  rule  1  of  Figure  6.9.  A  similar  trick  can  be  applied  for  rule  2. 
Since  the  remaining  rules  are  independent  of  the  structure  of  G,  they  are  already  taken  care  of  by 
the  fact  that  the  number  pairs  associated  with  node  pairs  are  interpreted  as  bases  of  consistently 
closed  relations.  The  critical  part  that  remains  to  be  shown  is  how  B"  is  calculated  from  B  and 
B'. 

For  three  pairs  (/i,  /2)i  ('i,  '2)1  Ci'.'i')  it  is  easy  to  check  whether  one  of  them  is  in  the  span  of 
the  other  two.  The  only  interesting  case  that  has  to  be  treated  is  if  this  is  not  the  case.  Then, 
w.l.o.g.,  5  =  {(/i,/2),(/i-l-fc,/2+A  +  c),(/i-l-ife',/2  +  fc'  +  c'}  where/i,/2,c,c'  >  0,/:  >  -c,k'  >  -c'. 

Proposition  40   Let  B  -  {(/i,/2),('i  -^k^li^-k  -h  c),  (/i  +  k\li  ^  k'  -V  c')}  where  li,l2,c,c'  > 
0,k>  -c,k'  >  -c'. 

ThenB'  =  {{li,l2),{h  +  k",l2  +  k"  +  c")},  with  k"  =  min{;fe,  ife'},c"  =  gcd(c,c'),  is  a  basis 
of(B}. 

Proof: 

It  is  sufficient  to  show  that  {B')  =  (5).  This  is  analogous  to  the  proof  of  theorem 
21. 

This  proposition  shows  that  it  is  possible  to  compute  a  basis  of  a  consistently  closed  relation 
spanned  by  three  number  pairs;  of  course,  his  construction  can  be  applied  repeatedly  to  calculate 
the  basis  of  any  finite  set  of  number  pairs.  We  shall  denote  the  basis  B  above  of  {B')  by  b{B'). 
We  can  now  translate  the  closure  rules  of  Figure  6.9  to  operations  on  bases  of  consistently  closed 
lelations  and  arrive  at  the  following  theorem. 

Theorem  22    There  is  an  algorithm  A^   that  computes  the  most  general  uniform  semi-unifier 
(in  a  suitable  representation)  of  any  SEI  S  £  ^[Ai,  V)  in  polynomial  space. 

Proof: 

Construct  an  initial  interaction  graph  G  for  S.  Apply  the  rewriting  steps  in  algorithm 
A^  in  Figure  6.11  to  G'  until  convergence.  The  biggest  number  occurring  in  any 
consistency  set  during  its  execution  is  bounded  by  2'"*  ,  where  m  is  the  number 
of  nodes  in  G'  (which  does  not  change).  This  can  be  shown  by  observing  that  the 
rewrite  rules  guarantee  that  numbers  only  decrease  unless  the  cardinality  of  some 
C{n,  n')  is  increased,  in  which  case  the  biggest  number  can  be  at  most  doubled  in  the 
rewritten  interaction  graph.  An  increase  of  cardinality  of  one  of  these  sets  can  happen 
at  most  2m}  times.  This  shows  that  A^  uses  at  most  polynomial  space  during  the 
first  stage.  The  second  stage  —  checking  for  a  violation  of  the  descendancy  check  — 
is  a  backtracking  algorithm  that  also  uses  at  most  polynomieil  speice.  Consequently, 
algorithm  A^  executes  in  polynomial  space. 

We  believe  that  the  first  stage  of  algorithm  A^  can  be  further  improved  to  run  in  polynomial 
time,  although  we  cannot  see  how  to  speed  up  the  second  stage  without  simplifying  the  interaction 
graph  from  the  first  stage  further  by  normalizing  it  with  respect  to  rule  6.12,  which  has  been 
proposed  by  Kapur  et  al.  [54]  to  arrive  at  a  polynomieil-time  (fecijt'on  algorithm  for  uniform  semi- 
unification  by  adding  the  "inverse"  of  rule  3  to  the  graph  rewriting  system  in  Figure  6.6  (see 
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(First  stage)  Apply  the  following  operations  to  G'  until  G'  does  not  change  any  more. 

1.  For  n,ni,n2,n',n[,n'j  E  N  such  that  E{n)  =  (tij,  rij),  £^(m)  =  {mi,m2), 

C{nun[)     :=     6(C(ni,  n'J  U  C(n,  u')) 
C(n2,n'j)     :=     b(C(n2,  n'^)  U  C(n,  ti')) 

2.  For     ni,n2,n3,ni       e       ^'i      if     (/oi,'io)       S       (^(tiq,  rii),  (/23,  ^32)       £ 

^("2,  "3),  ('02,  '20)   e  C(tIo,  Tlj), 

if  d  is  the  smallest  natural  number  such  that  ^02  +  ''^  '01  and  '20  +  "^  ^  '231 
then  C(ni,  713)  :=  6(C(ni,  ria)  U  {(/lo  +  (/02  +  d-  /oi),  '32  +  ('20  +  «i  -  '23))})- 

3.  C{n',  n)  :=  b{C{n',  n)  U  C'^n,  n')) 

where  C-i(n,n')  =  {(/2,/i),(/i,/l)}  if  C(n,n')  =  {(/i,Jj),(/;,/^)}. 

(Second  stage)  Execute  check(7i,0),  with  an  initially  empty  stack,  for  all  nodes  n  and 
see  whether  an  error  is  signaled.  If  so,  the  normal  interaction  graph  after  the  first 
stage  has  no  valid  interpretation;  if  not,  it  has  a  valid  interpretation. 

check(n,  /)  = 

if  there  is  {n,l')  in  stack  then 

if  C{n,  n)  yt  {(0,  0)}  or  /'  >  i  then 

signal  error  and  terminate; 
else 

.eturn; 
end  if  r 

else 

if  there  is  {n',l')  such  that  (/,/')  G  C(n,n')  and 
L{n')^f,E{n')  =  (n[,n'2)then 
push  {n,l)  onto  stack; 
check(n'i,i'); 
check(7i'2,r); 
pop  (n,  /)  off  stack; 
else 

return; 
end  if 
end  if 


Figure  6.11:  Algorithm  A^ 


If  there  exist  nodes  mi,  7712,  rii,  eind  n.2  such  that  7712  ~  712,  ttij  — ♦  7712  and  7ii  — +  712 
then  merge  the  equivalence  classes  of  771 1  and  711; 

Figure  6.12:  Inverse  of  rule  3a 
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Figure  6.12).  Note  that  our  algorithm  permits  us  to  extract  a  most  general  uniform  semi-unifiet 
by  "running"  the  interpretation  "program"  /  in  the  proof  of  theorem  20. 

The  inverse  of  rule  3  (Figure  6.12)  is  sound,  but  not  complete  in  our  sense,  but  it  preserves 
semi-unifiabiUty  in  the  uniform  case.  It  does  not  preserve  semi-unifiabilty  for  two  or  more 
inequalities  and  thus  is  not  correct  for  nonuniform  semi-unification.  Now  arithmetization  of 
algorithm  A',  which  consists  of  A  and  the  new  rule  #3',  yields  a  polynomial-time  algorithm. 

Theorem  23    Uniform  semi-nnifiability  is  polynomial-time  decidable. 

Proof:     See  [54]. 
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Chapter  7 

Decidability:  Elementary 
Combinatorial  Properties  and 
Approaches 


Semi-unification  is,  at  present,  not  known  to  be  decidable.  There  have  been  several  attempts  at 
proving  its  decidability  (or  decidability  of  one  of  the  problems  we  have  shown  to  be  polynomial- 
time  equivalent),  but  they  all  failed.  In  this  chapter  we  introduce  graph-theoretic  notions  that 
may  simplify  the  analysis  of  the  combinatorial  properties  of  semi-unification  and  eventually 
lead  to  a  proof  of  decidability.  Semi-unification  is  widely  believed  to  be  decidable;  in  fact,  we 
conjecture  that  algorithm  A  is  uniformly  terminating.  In  the  first  section  of  this  chapter,  we 
present  a  "normalization"  of  executions  of  algorithm  A  that  may  be  helpful  in  getting  good 
insight  into  this  problem.  In  the  second  section  we  present  a  generalization  of  executions  of 
algorithm  A,  called  graph  developments,  that  are  simpler  in  the  sense  that  they  abstract  from 
the  specific  effect  of  the  rules  that  affect  the  equivalence  relation  in  arrow  graphs.  Our  feeling 
is  that  this  generalized  problem  is  still  decidable  and  may  indeed  prove  easier  to  solve  than  the 
more  involved  structure  of  executions  of  algorithm  A. 

7.1      Normalized  Executions 

The  satisfiability  problem  for  arrow  graphs  is  the  problem  of  deciding  whether  there  is  a  valid 
interpretation  for  a  given  arrow  graph.  Since  arrow  graph  representations  can  be  constructed 
efficiently  from  SEI's  it  is  clear  that  semi-unification  is  polynomial-time  reducible  to  satisfiability 
of  arrow  graphs. 

A  fc-colored  arrow  graph  G  =  [N,  Np,  L,  E,  A,  ~)  over  A  is  downward  closed  if  the  following 
closure  rules  hold. 

1.  Cim.n  e  ^f,  mi,  7712,  Til,  nj  £  N  if  L{m)  =  L{n)  =  /,  E{m)  =  (mi,  m2),  E{n)  =  (711,712) 
then  m  ~  71  =>  771,-  ~  Tij  for  1  <  »  <  2  and  m  -^  71  =>  m^  -^  tij  for  I  <  i  <  2,1  <  j  <  k; 

2.  (Vm,  m',  n,n'  £  N,l  <  j  <  k)  (m  ~  m',  m  -^  71,  m'  -^  71'  =>  71  ~  ti')  and  (m  -^  71,  m  ~ 

771',  71  ~  7l'  =>  7n'  — »  7l'). 

It  is  easy  to  see  that  every  arrow  graph  G  has  a  unique  smallest  downward  closure,  closure(G), 

90 


which  is  simply  the  arrow  graph  reached  by  repeatedly  applying  the  above  closure  rules  as  rewrite 
rules  until  no  longer  possible.^ 

Downward  closure  preserves  valid  interpretations. 

Proposition  41   I  is  a  valid  interpretation  of  arrow  graph  G  if  and  only  if  I  is  a  valid  inter- 
pretation o/ closure  (G). 

We  can  factor  out  the  equivalence  relation  ~  in  downward  closed  arrow  graphs  to  arrive  at 
an  essentially  equivalent,  but  simplified,  arrow  graph.  Specifically,  we  define 

where 

1.  N/^  is  the  set  of  equivalence  classes  of  ~;  [n]^  denotes  the  equivalence  class  of  n  £  N; 

2.  Np/^  is  the  set  of  equivalence  classes  that  contain  some  functor  node; 

3.  Ll^{[n]^)  =  /  if  L{n')  —  f  for  some  n'  ~  n;  otherwise  i/^([Ti]^)  =  «  if  a  is  the  least 
variable  contained  in  any  n'  '^  n  (w.r.t.  to  a  given  fixed  total  order  on  V); 

4.  E/^{[n]^)  =  {[n[U,  [n',U)  if  n  ~  n'  and  E{n')  =  {n[,n',); 

5.  ([n]^,[n%)  e  {A/^)i  if  and  only  if  (n.n')  €  ^i  for  1  <  i  <  fc; 

6.  L  is  the  trivial  equivalence  relation  on  N/^; 
if  the  following  three  conditions  are  satisfied: 

1.  (Vn,  n'  6  Np)  n  ~  n'  =>  L(n)  —  L(ti')  (no  functor  clash); 

2.  The  extended  occurs  check  (rule  4a  in  Figure  6.6)  is  not  triggered. 

If  either  of  these  conditions  is  violated  we  define  Gj ^  —  Q  where  O  denotes  a  fixed  arrow 
graph  with  no  valid  interpretation.  We  call  any  arrow  graph  with  a  trivial  equivalence  relation 
(i.e.,  only  the  identity  pairs  (n,  n),7i  6  N[G),  are  in  the  equivalence  relation)  normalized. 

Proposition  42   Let  G  he  a  downward  closed  arrow  graph  with  equivalence  relation  ~.    Then 
G/^  =  a  or 

1.  G/ ^  is  downward  closed;  and 

2.  G/ ^  is  normalized;  and 

3.  any  valid  interpretation  of  G  canonically  induces  a  valid  interpretation  of  G/^  and  vice 
versa. 

Proof: 

(1)  and  (2)  are  trivial.  For  (3)  we  can  verify  that  for  any  valid  interpretation  I  of  G, 
7(n)  =  !{'"'')  if  Ti  ~  n',  and  consequently  /([n].^)  is  well-defined;  conversely,  a  valid 
interpretation  /  oiG/^  extends  to  G  by  simply  defining  I{n)  =  /([n]^). 


^This  can  be  made  precise  by  defining  arrow  graph  morphisms  and  proving  uniqueness  and  minimality  by 
induction  on  the  depth  —  with  respect  to  dag  edges  —  of  the  arrow  graph. 
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Given  any  arrow  graph  G  we  denote  by  G  the  normalized,  downward  closed  arrow  graph 
defined  by  closure(G)/^  where  ~  is  the  equivalence  relation  in  closure(G). 

Proposition  43  G  u  polynomial-time  computable. 

Proof: 

A  simple  adaptation  of  the  union-find  based  unification  algorithm  [43,  1]  yields  an 
algorithm  that  executes  in  time  0{kna{n,n))  where  a  is  an  extremely  slow-growing 
function  (see  [115]). 

We  can  now  define  a  reduction  relation  on  normalized,  downward  closed  arrow  graphs  simply 
by  executing  rule  4b  (Figure  6.6)  with  subsequent  exhaustive  application  of  rules  1,  2,  3,  and  4a, 
which  corresponds  to  computing  G'  from  G'  after  G  has  been  transformed  into  G'  by  application 
of  rule  4b  at  some  discrepancy.  We  say  G  reduces  to  G'  and  write  G  =>  G' . 

Proposition  44  Let  G  be  an  arrow  graph,  and  let  G'  be  defined  as  above.  Denote  the  nodes  of 
G  with  N,  and  the  nodes  of  G'  with  N' .  Then  for  any  valid  interpretation  I  of  G  there  is  a  valid 
interpretation  I'  of  G'  such  that  I'  |jv=  L  and,  conversely,  for  every  valid  interpretation  I'  of 
G' ,  I'  \f{  is  a  valid  interpretation  of  G. 

Proof:     Obvious 

Note  that  =>  defines  a  reduction  relation  on  downward  closed,  normalized  arrow  graphs.  The 
previous  propositions  guarantee  that  this  reduction  relation  preserves  valid  interpretations.  A 
sequence  (Gi, . . . ,  G,-, . . .)  of  downward  closed,  normalized  arrow  graphs  is  a  normalized  execution 
ifGi  =>  Gi+i  for  i  >  1  and,  if  it  is  finite,  its  last  element  is  irreducible.  We  say  a  downward  closed, 
normalized  arrow  graph  G  is  solvable  if  there  exists  a  finite  normalized  execution  (Gi, . . . ,  Gn) 
such  that  G  =  Gi  and  G*  ^  □. 

Proposition  45   Semi-unification  is  polynomial-time  reducible  to  arrow  graph  solvability. 

Proof: 

By  the  correctness  of  algorithm  A. 

The  reduction  relation  =>  on  downward  closed,  normalized  arrow  graphs  effectively  "col- 
lapses" the  compound  effect  of  exhaustive  application  of  rules  1,  2,  and  3  in  Figure  6.7  of  chapter 
6.  Note  also  that  the  exhaustive  application  of  these  rules  can  be  done  very  efficiently  .since  an 
extended  occurs  check  —  which  subsumes  the  ordinary  occurs  check  —  is  only  done  once,  after 
rules  1,  2,  and  3  are  applied  exhaustively.  The  propositions  above  follow  immediately  from  the 
fact  that  algorithm  A  is  just  a  graph-theoretic  reformulation  of  the  "canonical"  SEI-rewriting 
system  for  computing  most  general  semi-unifiers  in  section  6.2. 

Our  hope  is  that  this  reduction  relation,  maybe  in  connection  with  the  combinatorial  struc- 
ture in  the  following  section,  is  possible  starting  point  for  a  much  deeper  understanding  of  the 
algebraic  and  combinatorial  structure  of  executions  of  algorithm  A  that  will  eventually  lead  to 
a  proof  of  uniform  termination  of  A  and,  consequently,  of  decidability  of  semi-unification. 

7.2      Graph  Developments 

Inspired  by  the  construction  of  Kanellakis  and  Mitchell  that  shows  that  ML  typing  is  PSPACE- 
hard  [53]  our  intuition  is  that  the  reduction  rules  1  and  3  incorporate  the  computational  "in- 
telligence" of  algorithm  A  in  that  they  "steer"  the  execution  whereas  rule  4  and,  to  a  lesser 
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Let  G  =  {N,Nf,  L,  E,A)  be  a  graph."    Define  the  reduction  relation  — ►,  by  the  following 
two  rules. 

1.  If  there   exist    m,  m' ,  m" ,  n,  n' ,  n"    £    N   such   that    I(m)    =    L{n)    =    f,E{m)    = 
(m',  m"),  E(n)  =  {n' ,  n")  and  m ->  n,  but  m'  7^  n'  (or  m"  -^  n"),  then 

G  — ,  G[A  :-  AU  (m',  7i')](or  G  — ,  G[A  :=  ^  U  (m",  n")],  respectively). 

2.  (a)  If  there  exist  m,n  £  N  such  that  n  is  a  proper  descendant  of  m  and  there  is  a 

(possibly  empty)  arrow  path  from  m  to  n,  then 

(b)  if  rule  2a  above  does  not  apply  and  there  exist  m,Tn',Tn",n  G   N  such  that 
L(m)  =  /,  E(m)  =  (ru',  m"),  I(n)  S  V^,  then 

G     --,      G[iV:=.Aru{n',u"},i  ~i{nH-/,n'^r,7j,"H^i"}, 
£:=  E{nH-.(ii',n")}]. 

where  n'  (or  n")  is  either  an  old  node,  n'  E  N ,  01  a.  new  node,  n'  ^  N ,  and  if  n' 
is  a  new  node  then  /'  =  z'  for  a  new  variable  x' ,  otherwise  /'  =  L[n');  similar  for 
n  . 

In  all  these  cases  the  node  m  is  the  hinge  of  the  rule  application. 


*We  shall  write  n  —*  m  for  (n,  m)  G  .4. 


Figure  7.1:  Graph  development  Rules 

degree,  rule  2,  simply  create  the  necessary  space  resources.  For  this  reason  .we  first  introduce  the 
notion  of  (arrow)  graph  developments.  Following,  we  show  that  every  execution  of  algorithm  A 
induces  an  arrow  graph  development,  the  finiteness  of  which  can  be  "lifted  back"  to  show  that 
any  execution  sequence  describing  A  is  finite. 

In  this  section  we  shall  consider  arrow  graphs  without  an  equivalence  relation  and  with  arrows 
of  only  one  color;  i.e.,  they  consist  of  a  term  graph  and  arrows  (all  of  the  same  color)  only.  For 
convenience'  sake  we  shall  simply  call  them  graphs.  These  graphs  can  be  identified  with  arrow 
graphs  that  have  only  a  trivial  equivalence  relation  on  their  nodes.  In  this  sense  the  notions  of 
descendant,  arrow  path  and  so  forth  carry  over  from  arrow  graphs  to  graphs. 

We  shall  now  introduce  a  reduction  relation,  also  denoted  by  — ►,,  between  graphs.  It  is 
defineid  by  two  rules  given  in  Figure  7.1.  The  surface  similarity  of  these  rules  with  arrow  graph 
reduction  rules  2  and  4  in  Figure  6.7  is  not  coincidental  and  will  be  made  precise  just  below. 

Graphs  for  which  no  rule  is  applicable  (in  particular  □)  are  normal  graphs. 

Definition  18   (Graph  development) 

A  graph  development  j'j  a  (possibly  infinite)  sequence  Q  =  (Gi, . . . ,  Gj, . . .)  of  graphs  where 
Gi+i  is  derived  from  Gi  by  application  of  one  of  the  rules  in  Figure  7.1  for  all  i  >  1  and,  if  G 
is  finite  then  the  last  component  in  G  is  a  normal  graph. 

The    limit    graph    limQ    of   a    graph    development    Q     —     (Gi, . . . ,  Gj, . . .)    where    Gj     = 
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{N\Nir,L\E\A')   is  a   if  G  13  finite  and  tfj  last  component  is  D;  otherwise  it  is  defined  by 
{N,Nf,L,E,A)  where 

N     =     {n:  (3i)n  6  N'} 
Nf     =     {n:(3»)ne  JV>} 

r(^^    -    [  /-    '/(3»)i'W  =  /  . 

^'  \    z,      otherwise,  and  U{n)  =  zjoT  some  X 

E{n)     =     (n',n")ifi3i)E'(n)  =  (n',n"). 

The  first  component,  Gi,  of  a  graph  development  Q  -  (Gi, . . . ,  Gi, . . .)  is  called  the  initial 
graph  ofg.  A  node  in  lim  5  or  in  any  of  the  graphs  in  G  is  an  original  for  old  j  node  if  it  occurs 
in  the  initial  graph  of  G;  otherwise  it  is  a  new  node. 

Every  execution  of  algorithm  A  whose  final,  normal  arrow  graph  is  not  O  defines  a  graph 
development  in  a  canonical  fashion.  Consider  the  final  arrow  graph  G  of  an  execution  and  its 
equivalence  relation.  This  equivalence  relation  can  be  "factored"  out  from  every  arrow  graph  in 
the  execution  leading  up  to  G  in  almost  the  same  way  in  which  normalized  arrow  graphs  are 
formed  from  downward-closed  arrow  graphs  in  section  7.1. 

Let  us  now  consider  graph  developments  whose  limit  graph  is  not  CI.  For  any  G  = 
(Gi,  ...,Gj, .. .)  we  can  define  an  equivalence  relation  on  the  nodes  in  G  and  a  partial  or- 
der on  the  resulting  equivalence  relations.  For  limQ  =  {N,Nf,L,E)  define  n  <  n'  for 
7i,7i'  e  N  i(  n  ^  n'  (in  UmQ  )  or  E{n')  =  {n,n")  or  E{n')  =  {n",n)  for  some  n"  £  N. 
We  can  take  the  reflexive-transitive  closure  of  <  and  then  factor  out  the  equivalence  relation  S, 
n  =  n'  <^  n  <  . . .  <  n'and  n'  <  . . .  <  n,  which  defines  a  partial  order,  also  denoted  by  <,  on 
equivalence  classes  of  5?.  The  equivalence  class  containing  node  n  shall  be  denoted  by  [n]. 

We  call  a  graph  development  G  fair  if  for  every  node  in  G  that  becomes  a  hinge  for  a  rule 
application  the  corresponding  rule  is  eventually  executed. 

Proposition  46   Let  G  be  a  fair  graph  development  with  V\mQ  =  [N ,  Nf,  L,  E)  -^  □,  and  let  < 
be  the  partial  order  on  ^-equivalence  classes  of  N  defined  above. 

For  all  nodes  n,  n',  n"  £  N,  if  E{n)  -  (n',  n")  then  [n']  <  [n]  and  [n"]  <  [n]. 

Proof: 

It  is  clear  by  definition  that  [n']  <  [n]  and  [n"]  <  [n].  We  need  to  show  that 
[n]  ^  [n'].  Let  us  assume  [ji]  <  [n'].  By  definition  of  <  there  is  a  sequence  of  nodes 
("ooi  "oil  ^10,  "ill  •  •  •  >  i»o>  "*i)>  fc  >  0  such  that  noo  =  n,  n^i  =  n'  and  for  0  <  i  <  fc 
there  is  a  (possibly  empty)  arrow  path  from  tijo  to  nn  and  for  0  <  i  <  fe  —  1  the 
node  Tiii  is  a  child  of  nf^i+\)o-  Since  G  is  fair  we  can  show  by  induction  on  the  length 
of  these  sequences  that  there  exists  a  proper  descendant  m  of  n^o  iii  G  such  that 
there  is  an  arrow  path  from  n'(=  ni,i)  to  m.  Consequently  there  is  an  arrow  path 
from  nno  to  m..  But  this  means  that  njo  must  be  a  hinge  for  applying  the  "extended 
occurs  check"  rule  2a  in  a  component  of  G.  Since  G  is  fair  by  assumption  this  means 
that  the  extended  occurs  check  rule  is  applied  at  some  point  in  G  and  consequently 
VimQ  =  O.  But  this  is  in  contradiction  to  our  assumption  that  the  limit  graph  is  not 
D. 

This  proposition  shows  that  proper  <-inequalities  hold  between  (the  equivalence  class)  of  a 
child  and  (the  equivalence  class  of)  its  parent.  This  is  critically  due  to  the  extended  occurs  check 
rule,  rule  2a,  since  the  notion  of  fairness  mandates  that  every  rule  that  can  possibly  be  executed 
at  some  node  eventually  is.    In  fact  it  is  easy  to  give  a  (necessarily  infinite  and  unfair)  graph 
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development  in  which  the  resulting  (infinite)  limit  graph  has  equivalence  classes  that  contain  a 
child  and  its  parent. 

This  separation  of  equivalence  classes  along  parent-child  edges  is  crucial  in  the  following 
lemma.  For  G  =  (N,Nf,L,E,A)  and  C  C  TV  we  define  EnvG(C)  =  {in,n')  £  A  \  n  € 
Cot  n'inC}. 

Lemma  47  Let  G  be  a  fair  graph  development  with  lim5   =   {^i  ^F,  L,  E,A)   -^   n   and  let 
G'  =  {N',  N'j,,  L',  E',  A')  be  the  initial  graph  of  G. 

For  any  mazimal  equivalence  class  C  in  ]imQ  we  have 

1.  For  all  n  £  C,  n  £  N'  and  n  is  not  a  child  in  G'  of  any  node  n'  £  N' . 

2.  Envu^g{C)  =  EnvG'iC) 
Proof: 

1.  By  assumption,  C  is  a  maximal  equivalence  class  in  lim5  with  respect  to  <. 
If  n  £  C  is  not  in  N',  then  it  must  have  been  introduced  by  rule  2b  since  this 
is  the  only  rule  that  adds  new  nodes.  But  then  n  would  have  to  be  the  child 
of  some  node  n'  in  a  component  of  G  and,  consequently,  in  VimQ,  which,  by 
proposition  46,  would  contradict  the  assumption  that  the  equivalence  class,  C, 
of  n  is  maximal  in  lim5-  If  n  were  the  child  of  a  node  n'  in  G'  then,  again,  C 
could  not  be  maximal  since  n  would  also  be  a  child  of  n'  in  lim^- 

2.  By  inspection  of  the  graph  development  rules  it  is  clear  that  the  containment 
Enviim5(C')  D  Envci(C)  holds.  Assume  it  is  a  proper  superset.  Then  a  new 
arrow,  with  a  node  n  6  C  at  its  head  or  at  its  tail,  must  have  been  introduced 
by  rule  1  since  this  is  the  only  rule  that  introduces  new  arrows.  But  this  means 
that  n  has  a  parent  in  limQ  and,  again,  it  follows  by  proposition  46  that  C  is 
not  maximal  contradicting  our  assumption. 

This  lemma  guarantees  that  any  group  of  nodes  that  turns  out  to  be  a  maximal  equivalence 
class  in  the  limit  graph  of  a  fair  graph  development,  all  the  arrows  between  them,  and  all  the 
arrows  whose  head  is  one  of  these  nodes  are  actually  already  present  in  the  initial  graph  of  the 
graph  development.  We  cannot  predict  which  group  that  will  be  by  looking  at  the  initial  graph 
since  rule  2b  can  wildly  pick  any  old  node  for  a  child  (thus  making  that  node  an  element  of  a 
nonmaximal  equivalence  class),  but  the  lemma  guarantees  that  there  exists  one,  no  matter  what 
(literally)  unpredictable  turns  rule  2b  takes.  Note  that  we  have  not  proved  that  the  limit  graph 
of  a  fair  graph  development  actually  has  maximal  equivalence  relations. 

Before  we  present  the  main  theorem  we  need  another  lemma.  In  a  graph  G  the  sources  of 
a  node  n  is  defined  to  be  the  set  of  all  nodes  n'  in  G  such  that  there  is  a  (possibly  empty) 
arrow  path  from  n'  to  n.  The  independent  sources  of  n  are  all  those  sources  of  n  that  have  only 
themselves  as  a  source.  Note  that  every  finite  graph  development  is  necessarily  fair. 

Lemma  48   Let  G  be  a  finite  graph  development  with  limQ  ^  O  and  initial  graph  G' .   Then,  for 
any  node  n  in  \imQ,  all  independent  sources  of  n  are  nodes  in  G' . 

Proof: 

This  can  be  shown  by  (finite)  induction  on  the  index  of  the  component  graphs  in  the 
graph  development.  If  we  insisted  on  "normalized"  graph  developments,  in  which  rule 
2b  is  only  executed  when  none  of  the  other  rules  is  applicable  (which  is  a  good  idea 
anyway  since  it  simplifies  the  process  of  looking  for  hinges  for  the  extended  occurs 
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check  rule,  rule  2a),  this  would  indeed  be  straightforward  to  prove.  Since  there  is 
a  slight  complication  in  "unnormalized"  graph  developments,  we  shall  momentarily 
generalize  the  notion  of  a  source.  In  rule  2b  we  say  there  is  a  "hop"  &om  m'  to  n' 
and  from  m"  to  n" .  A  node  m  in  Gj  is  a  phantom  source  of  ti  if  there  is  a  sequence  of 
nodes  (rio, . . . ,  71^)  such  that  m  =  uq,  n  =  th,  and,  for  1  <  i  <  fe,  there  is  a  hop  from 
7ij_i  to  7!,  or  ni_i  — ►  n^  in  Gj.  Independent  phantom  sources  are  defined  analogously 
to  independent  sources.  We  claim  that  for  all  graphs  in  a  graph  development  the  set 
of  independent  phantom  sources  is  already  contained  in  the  initial  graph. 

The  claim  holds  trivially  for  the  empty  prefix  of  a  given  graph  development  of  length 


1.  If  rule  1  is  applied  to  get  Gi  from  Gi_i,  let  us  denote  the  tail  of  the  new  arrow 
by  m  and  its  head  by  n.  The  sources  of  m  are  added  to  the  sources  of  every 
node  n'  that  n  is  a  source  of  Since  the  independent  sources  of  n'  are  then  a 
subset  of  the  independent  sources  of  m  and  of  the  independent  sources  n'  had 
before  the  rule  application,  by  induction  we  can  conclude  that  the  independent 
sources  of  n'  are  contained  in  the  initial  graph  G.  (For  the  other  nodes,  not 
affected  by  this  rule  application,  nothing  changes.) 

2.  If  rule  2b  is  applied  and  no  new  node  is  introduced  the  claim  remains  true 
trivially.  If  a  new  node  n  is  introduced,  then  there  is  a  hop  from  a  node  m 
already  in  Gi_i  to  n,  and  the  claim  remains  true. 

Since  G  is  a  finite  graph  development  the  claim  holds  true  for  the  final  graph  of 
G.  But,  in  the  final  graph,  for  every  hop  there  is  also  a  corresponding  arrow,  and 
consequently,  the  set  of  independent  phantom  sources  is  also  the  set  of  independent 
sources.  This  proves  the  lemma. 

Let  us  now  define  the  size  of  a  graph,  \G\,  simply  as  the  number  of  nodes  it  contains.   (The 
size  of  D  is  undefined.) 

Lemma  49  For  every  finite  graph  development  G  with  ]lmQ  9^  □  and  \\imQ\  —  s  whose  initial 
graph  G'  has  size  \G'\  =t>  I  and  that  has  a  maximal  equivalence  class  E  of  size  k  with  a  node 
n  E  E  that  is  functor  labeled  in  the  initial  graph  of  G  there  is  a  graph  development  5oo  uiith 
VimQao  7^  Q  <J"'^  |lim5oo|  =  «  -  fe  and  the  initial  graph  G\  of  Q^  has  size  \G[\  =  t  —  k. 

Proof: 

Since  G  is  finite,  it  is  fair,  and  its  limit  graph  has  a  maximal  equivalence  relation 
C.  Now,  by  assumption  there  is  an  equivalence  class  E  with  a  functor  labeled  node 
n.  Let  us  only  treat  the  case  where  k  =  1;  i.e.,  n  is  the  only  node  in  E.  Since  n 
is  functor  labelled  in  the  initial  graph  of  G,  there  are  children  n',  n"  of  n.  Now,  let 
N'  and  N"  be  the  independent  source  of  n',  respectively  n"  in  the  limit  graph.  By 
lemma  48,  all  elements  of  N'  and  N"  are  also  in  the  initial  graph  of  G.  Place  arrows 
from  any  independent  source  of  n'  to  n'  and  from  every  independent  source  of  71"  to 
n"  in  the  initial  graph  of  G,  possibly  adding  new  arrows,  and  delete  node  n  along 
with  the  edges  to  its  children.  This  results  in  a  new  initial  graph  graph  of  size  «  —  1. 
Now  we  can  "simulate"  G  on  the  smaller  initial  graph  by  simply  copying  the  steps 
from  G  do  not  involve,  directly  or  indirectly,  node  ti  and,  otherwise,  substituting 
steps  involving  some  of  the  added  arrows  whenever  node  n  is  involved. 


If  we  could  establish  a  (recursive)  lower  bound  (as  a  function  of  the  size  of  the  limit  graph  and 
possibly  the  size  of  the  initial  graph)  on  the  size  of  the  limit  graph  of  some  graph  development 
with  an  initial  graph  that  has  fewer  nodes  than  the  initial  graph  of  any  given  finite  graph 
development,  even  in  the  case  where  all  nodes  in  meiximal  equivalence  classes  are  variable  labeled 
in  the  initial  graph,  then  we  could  prove,  by  induction,  an  upper  bound  on  the  size  of  any  limit 
graph  as  a  function  of  the  size  of  the  initial  graph.  This  is  so  since  any  graph  development 
on  an  initial  graph  of  size  1  has  a  limit  graph  of  size  1.  This  would  establish  decidability  of 
semi-unification  since  the  existence  of  an  infinite  execution  of  algorithm  A  induces  an  infinite 
graph  development. 

We  might  be  tempted  to  "loosen"  the  notion  of  graph  development  even  more  by  requiring 
an  constant  upper  bound  on  the  number  of  outarrows  any  node  in  a  graph  can  have,  but  al- 
lowing arbitrary  insertion  of  arrows,  not  only  in  the  case  of  rule  1.  But  then  it  is  fairly  easy  to 
construct  sm  infinite  graph  development  as  long  as  at  least  two  outarrows  are  permissible.  Since 
a  generalization  of  executions  of  algorithm  A,  as  the  notion  of  graph  developments  is,  is  only 
sensible  if  it  admits  a  proof  of  termination  by  showing  that  only  finite  developments  are  possible 
this  further  generalization  is  useless. 
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Chapter  8 

Implications  for  Programming 
Language  Design 


In  this  chapter  we  attempt  to  shed  some  light  on  a  somewhat  puzzling  observation:  that  poly- 
morphic type  inference  is  theoreticaUy  intractable  and,  as  such  should  be  only  marginally  usable, 
yet  experience  with  declaration-free  polymorphic  languages  bears  witness  to  its  practical  utility. 
In  section  8.1  we  offer  some  general  considerations  to  suggest  that  the  apparent  practicality  of 
type  inference  is  not  just  a  lucky  coincidence,  and  in  section  8.2  we  briefly  formalize  some  of  our 
considerations. 

8.1      Theoretical   Intractability    and    Practical    Utility    of 
Polymorphic  Type  Inference 

Some  of  the  results  of  the  previous  chapters  seem  to  suggest  that  polymorphic  type  inference  (as 
modeled  by  the  Mycroft  Calculus)  has  no  place  in  programming  language  design.  After  all,  the 
type  inference  problem  is  at  least  PSPACE-hard,  which  is  already  beyond  the  point  of  what  is 
conventionally  considered  tractable,  and  likely  it  is  much  harder  than  that:  At  present  even  the 
decidability  question  is  not  solved. 

On  the  other  hand,  some  theoretical  results  and  preliminary  practical  experience  suggest  that 
this  evaluation  may  yet  be  too  pessimistic. 

First  of  all,  the  principal  typing  property  of  the  Mycroft  Calculus  guarantees  a  well-defined 
notion  of  what  the  typing  of  a  program  should  be,  and  this  notion  can  very  intuitively  be 
interpreted  as  the  "most  general"  typing  possible.  Even  though,  at  this  time,  the  decidability 
of  both  the  Mycroft  Calculus  and  the  (implicit)  Second  Order  A-calculus  is  open,  the  Mycroft 
Calculus  has  the  appealing  principal  typing  property,  which  is  in  contrast  to  the  Second  Order 
A-calculus  where  no  good  notion  of  "principality"  for  a  A-expression  is  known. 

Secondly,  there  is  a  relatively  simple  algorithm,  algorithm  A,  for  computing  principal  typings 
(in  the  more  general  sense  of  computing  typing  derivations  in  the  "syntax-oriented"  version  of  the 
Mycroft  Calculus)  that,  due  to  the  principal  typing  property,  does  not  necessitate  any  backtrack- 
ing  or  other  complicated  control  mechanisms.  This  can  be  seen  as  a  sign  of  "implementability" 
and  as  a  preliminary  indicator  that  the  type  inference  problem  may  prove  realisticiilly  usable 
since  many  problem  instances  will  admit  rapid  computation  of  their  principal  types. 

Thirdly,  languages  such  as  ML,  Miranda,  and  B  have  been  in  use  for  several  years  now, 
and  the  type  checking  phases  in  these  systems  have  been  sufficiently  efficient  in  actual  usage  to 
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help  promulgate,  for  about  ten  years,  the  myth  that  ML  type  checking  is  theoretically  efficient 
in  the  sense  that  it  was  believed  to  have  a  worst-case  polynomial  running  time  of  low  degree. 
The  fact  that  B's  type  inference  algorithm  is  actually  incomplete  (with  respect  to  B's  typing 
discipline),  but  that  this  apparently  hadn't  been  noticed,  only  corroborates  our  appraisal  that 
type  inference  problems  encountered  in  actual  programming  practice  are  of  the  kind  that  admit 
rapid  computation  of  principal  types  or  rapid  detection  of  type  errors.  Of  course,  since  the 
polymorphic  languages  in  question  are  stiU  used  rather  infrequently,  it  is  too  early  to  give  much 
weight  to  these  empirical  observations.  We  shall  attempt  to  argue,  though,  that  the  apparent 
practicality  of  polymorphic  type  inference  in  the  face  of  theoretical  infeasibility  results  is  not  a 
random  phenomenon. 

A  conventional  remedy  for  eliminating  problems  with  type  inference  is  to  mandate  explicit, 
fully  typed  declarations  of  variables,  parameters  and  other  basic  syntactic  units.  Observe,  for 
example,  that  type  checking  in  the  "explicit"  Second  Order  A-calculus  is  easy  in  the  sense  that 
there  is  a  fast  polynomial  time  algorithm  for  checking  the  type  correctness  of  a  fully  typed  A- 
expression.  Applying  this  sort  of  remedy  to  the  Mycroft  Calculus  highlights,  though,  why  type 
checking  (with  explicit  type  information  embedded  in  the  program)  is  no  more  "practical"  than 
type  inference  (with  no  or  only  optional  type  information  in  the  program). 

The  culprit  for  the  theoretical  intractability  of  the  Mycroft  Calculus  (and  the  Milner  Calculus) 
is  the  fact  that  the  type  information  of  a  program  can  be  extra-ordinarily  bigger  than  the  untyped 
program;  in  particular,  it  is  at  least  exponentieilly  bigger  [53].  Now,  writing  a  200-line  (untyped) 
program  whose  principal  type  is  bigger  (measured,  for  example,  in  terms  of  the  "tree"  size  of 
the  final  arrow  graph  of  the  corresponding  semi-unification  problem')  than  the  number  of  atoms 
in  the  universe  is  no  more  impractical  than  writing  the  program  with  this  typing  information 
in  the  first  place.  Even  though  both  these  cases  seem  to  have  the  same  "intuitive"  complexity 
they  are  treated  very  differently  in  conventional  complexity  analysis  since  the  two  input  sizes 
are  dramatically  different. 

This  may  be  seen  as  a  plea  to  measure  complexity  in  terms  of  the  sizes  of  the  input  program 
and  its  computed  principal  type.  This  would  permit  comparison  of  the  efficiency  of  different 
(sound  and  complete)  type  inference  algorithms  by  comparing  their  performance  on  typable 
inputs,  even  in  the  case  where  they  don't  terminate  for  some  untypable  inputs.  Yet  this  is  not 
quite  satisfactory  in  explaining  the  apparent  practicality  of  type  inference.  In  particular  it  does 
not  question  the  "legitimacy"  of  a  short  program  that  has  a  typing  of  inconceivable  size. 

We  feel  that  the  formalization  of  type  inference  in  logiccd  calculi  has  failed  to  take  the 
inien^iona/ character  of  types  and  typings  into  account.  Types  and  typings  are  generally  viewed 
as  abstractions  of  the  fce/iavtor  of  programs  and  their  parts,  and,  by  analogy  to  types  and  program 
behaviors,  type  descriptions  are  meant  to  be  abstractions  of  the  programs  themselves.  If  the 
complete  inferred  type  information  of  a  program  is  exponentially  bigger  than  the  (untyped) 
program  itself,  we  think  it  unreasonable  to  say  the  type  information  is  an  a6j<ract  description  of 
the  program.    Either  the  type  description  mechanism  is  inadequate  for  capturing  the  intended 


'  Meajuring  the  ««  of  type  information  in  thii  way  can  be  justified  as  follows.  When  admitting  —  or  requiring 
—  explicit  type  information  in  programs,  this  type  information  is  presented  by  tjfpe  expressions  of  the  kind  we 
have  u^ed,  and  by  no  other  mechanism  that  might  conceiTably  encode  type  information  in  some  other,  possibly 
more  compact  way.  Since  the  "siie  of  the  input"  is  usually  counted  as  the  number  of  symbols  in  the  input 
(with  or  without  taking  bit-complexity  into  account),  this  amounts  to  detemuning  the  size  of  all  explicit  typing 
information  in  a  program  as  the  turn  of  the  string  siies  of  the  type  expressions  occurring  in  it.  Since  every  part 
of  a  final  arrow  graph  corresponding  to  (Mycroft  Calculus)  type  Inference  for  program  e  is  represented  in  a  type 
expression  occurring  in  the  full  typing  for  e,  this  full  typing  information  is  at  least  as  big  as  the  tree  siie  of  the  final 
arrow  graph;  i.e.,  the  number  of  nodes  of  the  final  arrow  graph  once  it  is  "blown  up"  into  a  tree  (or  forest).  If  the 
fully  typed  program  can  be  written  with  type  abbreviations  of  the  sort  lettype  t  =  int  — ►  int  — >  int  in  . . ., 
then  the  type  information  can  be  represented  in,  asymptotically,  the  same  space  as  the  siie  of  the  final  arrow 
graph.  But  this  has  the  disadvantage  that  principal  types  are  not  necessarily  the  "smallest  types",  and  then 
determining  resource-bounded  typability  becomes  difHcult  again. 
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abstraction  of  behavior  or  the  program  at  hand  does  not  have  a  suitable  abstract  description  of  its 
behavior.  The  first  explanation  points  toward  a  problem  with  the  whole  language,  an  issue  that 
will  have  to  be  addressed  by  language  designers.  Given  a  fixed  static  typing  discipline,  however, 
and  its  implicit  insistence  that  only  behavior  that  is  expressible  in  it  should  be  considered 
desirable,  the  second  explanation  should  be  interpreted  as  saying  that  the  program  at  hand  has 
no  "reasonable"  abstract  description  of  its  behavior  and  thus  should  be  considered  unacceptable 
—  type-incorrect. 

Theoretical  type  inference  calculi  are  motivated  by  eztensional  considerations:  two  descrip- 
tions are  considered  completely  interchangeable  if  they  denote  the  same  semantic  objects,  re- 
gardless of  any  "syntactic"  properties  of  the  descriptions  (such  as  the  size).  Consequently,  the 
motivation  of  type  descriptions  as  syntactic  abstractions  of  programs  (and  not  only  program 
behavior)  is  lost  in  the  formalization  of  "practical"  type  inference  by  typed  A-calculi.  If  we  try 
to  recapture  some  of  this  connection  by  requiring  that  a  A-expression  e  only  be  considered  "ef- 
fectively well-typed"  whenever  it  is  typable  in  the  sense  of  the  Mycroft  Calculus  ani  its  principal 
type  is  at  most  polynomially  bigger  than  e  itself,  then  it  is  easy  to  see  that  effective  well-typing 
is  (theoretically)  feasible.  This  is  made  precise  in  section  8.2. 

Unfortunately,  this  does  not  explain  the  significance,  if  any,  of  the  extended  occurs  check  in 
algorithm  A  that  we  conjecture  makes  A  a  uniformly  terminating  algorithm  for  semi-unification. 
If  resource  bounds  on  the  sizes  of  typings  are  given  we  could  run  A  —  or  Meertens'  algorithm 
AA  or  Mycroft's  algorithm  —  either  until  a  principal  type  is  found  or  the  resource  bounds 
are  exceeded.  It  appears  that,  in  practice,  this  check  wiU  catch  many  typing  errors  early  on 
without  exhausting  the  possibly  big  resource  bounds.  As  a  matter  of  principle,  it  seems  that 
the  requirement  of  resource  bounds  in  type  systems  is  a  bad  idea^,  whereas  they  appear  to  be  a 
good  property  o{  &  type  system.  In  other  words,  it  is  preferable  to  devise  a  syntax-directed  type 
sys  em  whose  axioms  and  rules  guarantee  :  esource-boundedness  instead  of  explicitly  imposing  a 
global  restriction  that  mandates  explicit  resource  bounds.  We  think  this  is  a  problem  worthy  of 
attention  in  the  type  system  design  arena,  but  not  so  much  in  the  area  of  programming  language 
semantics.  After  all,  static  typing  disciplines  are  semantically  incomplete  anyway  (that  is,  there 
are  programs  that  are  considered  statically  type-incorrect  even  though  they  would  never  run  into 
a  type-incompatibility  at  run-time),  and  resource-bounded  static  typing  systems  are  just  "a  tad 
more"  incomplete. 

If  we  consider,  in  general,  (derivable)  typings  as  "witnesses"  to  the  fact  that  a  program  is 
well-typed,  then  typing  problems  whose  witnesses  are  required  to  be  polynomial-sized  fall  into 
two  main  complexity  classes:  P  and  NP.^  This  is  so  since  we  assume  that  any  reasonable  typing 
discipline  has  a  polynomial  time  type  checking  problem  for  programs  that  are  completely  dec- 
orated with  typing  information.  If  we  consider  the  "typing"  problem''  of  determining  whether 
there  is  an  assignment  of  (polynomial-sized)  type  expressions  to  function  definitions  in  a  language 
with  Ada-style  overloading,  but  without  explicit  type  declarations  (Ada  requires  such  explicit 
declarations),  it  can  be  shown  that  this  problem  is  ^P-complete  [1,  exercise  6.25],  whereas  the 
resource-bounded  polymorphic  type  inference  problem  is  in  P.  This  lends  some  technical  expres- 
sion to  the  intuition  that  "overload  resolution"  as  above  is  much  harder  than  polymorphic  type 
inference;  also,  in  practical  terms,  since  overload  resolution  requires  a  backtracking  algorithm, 
polymorphic  type  inference  should  be  expected  to  fare  much  better  in  practice  than  this  Liberal 
sort  of  overload  resolution.  Note  also  that  overload  resolution  has  no  principal  typing  property. 


'Imagine  error  messages  of  the  sort  "Well,  so  fax  everything  was  okay,  but  this  type  expression  here  is  a  little 
bit  too  big." 

^We  make  the  standard  assumption  that  P  ^  NP. 

*Some  people  would  not  consider  this  overload  resolution  problem  an  example  of  a  typing  problem. 

100 


Let  A  range  over  type  environments;  z  over  variables;  e,e'  over  A-expressions;  a  over  type 
variables;  r,  r'  over  monotypes;  (T,<r'  over  polytypes.  The  following  are  type  inference  axiom  and 
rule  schemes. 


Name 

Axiom/rule 

(TAUT) 

A{x  :  <t}  Z)  X  :  (T 

(GEN) 

AD  e  :(T 

(a  not  free  in  A) 

AD  e  I'ia.cr 

(INST) 

A  D  e  :  Va.cr 

ADe  :  (r[T/a] 

(ABS) 

A{x  ■.T'}De:T 

AD  \x:  r'.e  :  r'  ^  t 

(APPL) 

ADe-.T'^T 
ADe'  :r' 

A  D  (ee')  :  r 

(LET-P) 

AD  e:  (T 

.4{z  :  a}De'  :  a' 

A  D  let  X  :  a  —  ein  e'  :  cr' 

(FIX-P)       A{x  :  a}  D  e  :  IT 

A  D  fix  z  :  <T.e  :  a 

Table  8.1;  Type  inference  axioms  and  rules  for  explicit  Mycroft  Calculus 

8.2      Resource-Bounded  Polymorphic  Type  Inference 

Consider  the  type  inference  system  in  Table  8.1,  which  we  shall  call  the  explicit  Mycroft  Calculus. 
We  can  define  notions  of  typability  and  type  inference  as  usual.    Typed  A-expressions  are 
defined  by  the  grammar 

e  ::=:  z  |  Az  :  r.e  |  (ee')  | 
let  z  :  (7  =  e'  in  e  I 
fixz  :  (T.e 

where  r  ranges  over  monotypes,  and  a  over  polytypes,  as  usual.  For  every  typed  A-expression 
e  there  is  a  unique  underlying  untyped  A-expression,  e,  derived  by  erasing  all  mention  of  types 
in  the  typed  A-expression  (and  all  colons);  e  is  called  a  typed  version  of  e.  Clearly,  every  typed 
A-expression  has  a  principal  type  in  the  explicit  Mycroft  Calculus  with  respect  to  a  given  type 
assignment.  The  following  proposition  should  not  come  as  a  surprise. 

Proposition  50  There  is  a  polynomial  time  algorithm  for  com,puting  the  principal  type  of  a 
typed  X-ezpression  or  indicating  untypability. 
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We  can  now  formally  define  a  resource-bounded  restriction  of  the  Mycroft  Calculus.  Let  p  be 
a  fixed  polynomial  of  one  variable,  and  let  |e|  be  the  number  of  symbols  in  a  typed  or  untyped 
A-expression  e,  and  let  eMM  stand  for  the  explicit  Mycroft  Calculus.  Define 

MJVf  =  {e:  3A,(T  I  eMM  h  A  D  e  :  tr  and  |e|  <  p(|e|)} 

A  simple  way  to  think  about  this  set  is  to  recognize  that,  if  >1  D  e  :  (T  is  derivable  in 
eMM,  then  >1  D  e  :  cr  is  derivable  in  MM.  The  second  requirement  encodes  the  fact  that  MAP 
considers  only  those  untyped  A-expressions  type-correct  that  have  a  typed  equivalent  whose  type 
information  is  at  most  polynomially  bigger  than  the  untyped  A-expression  itself. 

Theorem  24  MAf  is  polynomial-lime  decidable. 

Proof: 

Execution  of  rule  4  in  algorithm  A  only  makes  the  tree  size  of  the  initial  arrow  graph 
properly  bigger.  Since  the  other  rules  cannot  reduce  the  tree  size  of  the  arrow  graph 
(note  though,  that  they  can  reduce  the  number  of  equivalence  classes  in  the  arrow 
graph)  and  they  can  be  executed  at  most  polynomially  many  times  with  respect 
to  the  "current"  arrow  graph  without  forcing  application  of  rule  4,  and  since  the 
tree  size  of  the  arrow  graph  can  be  computed  in  time  polynomial  in  the  number  of 
nodes  in  the  arrow  graph,  it  follows  that  rule  4  can  be  applied  at  most  polynomially 
many  times  without  exceeding  the  bound  given  by  p.  Consequently,  computing  the 
"principal"  typed  version  of  a  A-expression  e  can  be  done  in  polynomial  time,  and 
since  every  other  typed  version  of  e  that  satisfies  the  typing  rules  is  at  least  as  big 
as  the  principal  one,  this  proves  the  theorem. 

It  would  be  interesting  to  see  whether  this  theorem  also  holds  true  if  (monomorphic)  type 
abbreviations  of  the  form  let  type  »  =  r  in  ...  are  allowed. 
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